aboutsummaryrefslogtreecommitdiff
path: root/src/unicode
Commit message (Collapse)AuthorAge
* refactor(multibyte): replace generated unicode tables with utf8procbfredl2024-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit intentionally aims at preserving existing behavior as much as possible while replacing our build step to convert unicode data files into binary tables, which corresponding lookups in utf8proc. Actual improvements in behavior will be a followup. The only change in behavior is that 'emoji' option will turn some more codepoints into double with. Nvim used the "Emoji" and "Emoji_Presentation" properties to define emojis, while utf8proc only exposes the Extended_Pictographic property from the emoji table. This is a superset of the previous emoji properties. As only codepoints above 0x1f000 are affected by the 'emoji' option, this means that the following chars are now treated as double-width, instead of single-width like in previous nvim versions: ๐Ÿ€€ ๐Ÿ€ ๐Ÿ€‚ ๐Ÿ€ƒ ๐Ÿ€… ๐Ÿ€† ๐Ÿ€‡ ๐Ÿ€ˆ ๐Ÿ€‰ ๐Ÿ€Š ๐Ÿ€‹ ๐Ÿ€Œ ๐Ÿ€ ๐Ÿ€Ž ๐Ÿ€ ๐Ÿ€ ๐Ÿ€‘ ๐Ÿ€’ ๐Ÿ€“ ๐Ÿ€” ๐Ÿ€• ๐Ÿ€– ๐Ÿ€— ๐Ÿ€˜ ๐Ÿ€™ ๐Ÿ€š ๐Ÿ€› ๐Ÿ€œ ๐Ÿ€ ๐Ÿ€ž ๐Ÿ€Ÿ ๐Ÿ€  ๐Ÿ€ก ๐Ÿ€ข ๐Ÿ€ฃ ๐Ÿ€ค ๐Ÿ€ฅ ๐Ÿ€ฆ ๐Ÿ€ง ๐Ÿ€จ ๐Ÿ€ฉ ๐Ÿ€ช ๐Ÿ€ซ ๐Ÿ€ฐ ๐Ÿ€ฑ ๐Ÿ€ฒ ๐Ÿ€ณ ๐Ÿ€ด ๐Ÿ€ต ๐Ÿ€ถ ๐Ÿ€ท ๐Ÿ€ธ ๐Ÿ€น ๐Ÿ€บ ๐Ÿ€ป ๐Ÿ€ผ ๐Ÿ€ฝ ๐Ÿ€พ ๐Ÿ€ฟ ๐Ÿ€ ๐Ÿ ๐Ÿ‚ ๐Ÿƒ ๐Ÿ„ ๐Ÿ… ๐Ÿ† ๐Ÿ‡ ๐Ÿˆ ๐Ÿ‰ ๐ŸŠ ๐Ÿ‹ ๐ŸŒ ๐Ÿ ๐ŸŽ ๐Ÿ ๐Ÿ ๐Ÿ‘ ๐Ÿ’ ๐Ÿ“ ๐Ÿ” ๐Ÿ• ๐Ÿ– ๐Ÿ— ๐Ÿ˜ ๐Ÿ™ ๐Ÿš ๐Ÿ› ๐Ÿœ ๐Ÿ ๐Ÿž ๐ŸŸ ๐Ÿ  ๐Ÿก ๐Ÿข ๐Ÿฃ ๐Ÿค ๐Ÿฅ ๐Ÿฆ ๐Ÿง ๐Ÿจ ๐Ÿฉ ๐Ÿช ๐Ÿซ ๐Ÿฌ ๐Ÿญ ๐Ÿฎ ๐Ÿฏ ๐Ÿฐ ๐Ÿฑ ๐Ÿฒ ๐Ÿณ ๐Ÿด ๐Ÿต ๐Ÿถ ๐Ÿท ๐Ÿธ ๐Ÿน ๐Ÿบ ๐Ÿป ๐Ÿผ ๐Ÿฝ ๐Ÿพ ๐Ÿฟ ๐Ÿ‚€ ๐Ÿ‚ ๐Ÿ‚‚ ๐Ÿ‚ƒ ๐Ÿ‚„ ๐Ÿ‚… ๐Ÿ‚† ๐Ÿ‚‡ ๐Ÿ‚ˆ ๐Ÿ‚‰ ๐Ÿ‚Š ๐Ÿ‚‹ ๐Ÿ‚Œ ๐Ÿ‚ ๐Ÿ‚Ž ๐Ÿ‚ ๐Ÿ‚ ๐Ÿ‚‘ ๐Ÿ‚’ ๐Ÿ‚“ ๐Ÿ‚  ๐Ÿ‚ก ๐Ÿ‚ข ๐Ÿ‚ฃ ๐Ÿ‚ค ๐Ÿ‚ฅ ๐Ÿ‚ฆ ๐Ÿ‚ง ๐Ÿ‚จ ๐Ÿ‚ฉ ๐Ÿ‚ช ๐Ÿ‚ซ ๐Ÿ‚ฌ ๐Ÿ‚ญ ๐Ÿ‚ฎ ๐Ÿ‚ฑ ๐Ÿ‚ฒ ๐Ÿ‚ณ ๐Ÿ‚ด ๐Ÿ‚ต ๐Ÿ‚ถ ๐Ÿ‚ท ๐Ÿ‚ธ ๐Ÿ‚น ๐Ÿ‚บ ๐Ÿ‚ป ๐Ÿ‚ผ ๐Ÿ‚ฝ ๐Ÿ‚พ ๐Ÿ‚ฟ ๐Ÿƒ ๐Ÿƒ‚ ๐Ÿƒƒ ๐Ÿƒ„ ๐Ÿƒ… ๐Ÿƒ† ๐Ÿƒ‡ ๐Ÿƒˆ ๐Ÿƒ‰ ๐ŸƒŠ ๐Ÿƒ‹ ๐ŸƒŒ ๐Ÿƒ ๐ŸƒŽ ๐Ÿƒ‘ ๐Ÿƒ’ ๐Ÿƒ“ ๐Ÿƒ” ๐Ÿƒ• ๐Ÿƒ– ๐Ÿƒ— ๐Ÿƒ˜ ๐Ÿƒ™ ๐Ÿƒš ๐Ÿƒ› ๐Ÿƒœ ๐Ÿƒ ๐Ÿƒž ๐ŸƒŸ ๐Ÿƒ  ๐Ÿƒก ๐Ÿƒข ๐Ÿƒฃ ๐Ÿƒค ๐Ÿƒฅ ๐Ÿƒฆ ๐Ÿƒง ๐Ÿƒจ ๐Ÿƒฉ ๐Ÿƒช ๐Ÿƒซ ๐Ÿƒฌ ๐Ÿƒญ ๐Ÿƒฎ ๐Ÿƒฏ ๐Ÿƒฐ ๐Ÿƒฑ ๐Ÿƒฒ ๐Ÿƒณ ๐Ÿƒด ๐Ÿƒต ๐Ÿ„ ๐Ÿ„Ž ๐Ÿ„ ๐Ÿ„ฏ ๐Ÿ…ฌ ๐Ÿ…ญ ๐Ÿ…ฎ ๐Ÿ…ฏ ๐Ÿ†ญ ๐ŸŒข ๐ŸŒฃ ๐ŸŽ” ๐ŸŽ• ๐ŸŽ˜ ๐ŸŽœ ๐ŸŽ ๐Ÿฑ ๐Ÿฒ ๐Ÿถ ๐Ÿ“พ ๐Ÿ•† ๐Ÿ•‡ ๐Ÿ•ˆ ๐Ÿ• ๐Ÿ•จ ๐Ÿ•ฉ ๐Ÿ•ช ๐Ÿ•ซ ๐Ÿ•ฌ ๐Ÿ•ญ ๐Ÿ•ฎ ๐Ÿ•ฑ ๐Ÿ•ฒ ๐Ÿ•ป ๐Ÿ•ผ ๐Ÿ•ฝ ๐Ÿ•พ ๐Ÿ•ฟ ๐Ÿ–€ ๐Ÿ– ๐Ÿ–‚ ๐Ÿ–ƒ ๐Ÿ–„ ๐Ÿ–… ๐Ÿ–† ๐Ÿ–ˆ ๐Ÿ–‰ ๐Ÿ–Ž ๐Ÿ– ๐Ÿ–‘ ๐Ÿ–’ ๐Ÿ–“ ๐Ÿ–” ๐Ÿ–— ๐Ÿ–˜ ๐Ÿ–™ ๐Ÿ–š ๐Ÿ–› ๐Ÿ–œ ๐Ÿ– ๐Ÿ–ž ๐Ÿ–Ÿ ๐Ÿ–  ๐Ÿ–ก ๐Ÿ–ข ๐Ÿ–ฃ ๐Ÿ–ฆ ๐Ÿ–ง ๐Ÿ–ฉ ๐Ÿ–ช ๐Ÿ–ซ ๐Ÿ–ฌ ๐Ÿ–ญ ๐Ÿ–ฎ ๐Ÿ–ฏ ๐Ÿ–ฐ ๐Ÿ–ณ ๐Ÿ–ด ๐Ÿ–ต ๐Ÿ–ถ ๐Ÿ–ท ๐Ÿ–ธ ๐Ÿ–น ๐Ÿ–บ ๐Ÿ–ป ๐Ÿ–ฝ ๐Ÿ–พ ๐Ÿ–ฟ ๐Ÿ—€ ๐Ÿ— ๐Ÿ—… ๐Ÿ—† ๐Ÿ—‡ ๐Ÿ—ˆ ๐Ÿ—‰ ๐Ÿ—Š ๐Ÿ—‹ ๐Ÿ—Œ ๐Ÿ— ๐Ÿ—Ž ๐Ÿ— ๐Ÿ— ๐Ÿ—” ๐Ÿ—• ๐Ÿ—– ๐Ÿ—— ๐Ÿ—˜ ๐Ÿ—™ ๐Ÿ—š ๐Ÿ—› ๐Ÿ—Ÿ ๐Ÿ—  ๐Ÿ—ข ๐Ÿ—ค ๐Ÿ—ฅ ๐Ÿ—ฆ ๐Ÿ—ง ๐Ÿ—ฉ ๐Ÿ—ช ๐Ÿ—ซ ๐Ÿ—ฌ ๐Ÿ—ญ ๐Ÿ—ฎ ๐Ÿ—ฐ ๐Ÿ—ฑ ๐Ÿ—ฒ ๐Ÿ—ด ๐Ÿ—ต ๐Ÿ—ถ ๐Ÿ—ท ๐Ÿ—ธ ๐Ÿ—น ๐Ÿ›† ๐Ÿ›‡ ๐Ÿ›ˆ ๐Ÿ›‰ ๐Ÿ›Š ๐Ÿ›“ ๐Ÿ›” ๐Ÿ›ฆ ๐Ÿ›ง ๐Ÿ›จ ๐Ÿ›ช ๐Ÿ›ฑ ๐Ÿ›ฒ ๐Ÿด ๐Ÿต ๐Ÿถ ๐Ÿป ๐Ÿผ ๐Ÿฝ ๐Ÿพ ๐Ÿฟ ๐ŸŸ• ๐ŸŸ– ๐ŸŸ— ๐ŸŸ˜ ๐ŸŸ™ ๐Ÿขฐ ๐Ÿขฑ ๐Ÿจ€ ๐Ÿจ ๐Ÿจ‚ ๐Ÿจƒ ๐Ÿจ„ ๐Ÿจ… ๐Ÿจ† ๐Ÿจ‡ ๐Ÿจˆ ๐Ÿจ‰ ๐ŸจŠ ๐Ÿจ‹ ๐ŸจŒ ๐Ÿจ ๐ŸจŽ ๐Ÿจ ๐Ÿจ ๐Ÿจ‘ ๐Ÿจ’ ๐Ÿจ“ ๐Ÿจ” ๐Ÿจ• ๐Ÿจ– ๐Ÿจ— ๐Ÿจ˜ ๐Ÿจ™ ๐Ÿจš ๐Ÿจ› ๐Ÿจœ ๐Ÿจ ๐Ÿจž ๐ŸจŸ ๐Ÿจ  ๐Ÿจก ๐Ÿจข ๐Ÿจฃ ๐Ÿจค ๐Ÿจฅ ๐Ÿจฆ ๐Ÿจง ๐Ÿจจ ๐Ÿจฉ ๐Ÿจช ๐Ÿจซ ๐Ÿจฌ ๐Ÿจญ ๐Ÿจฎ ๐Ÿจฏ ๐Ÿจฐ ๐Ÿจฑ ๐Ÿจฒ ๐Ÿจณ ๐Ÿจด ๐Ÿจต ๐Ÿจถ ๐Ÿจท ๐Ÿจธ ๐Ÿจน ๐Ÿจบ ๐Ÿจป ๐Ÿจผ ๐Ÿจฝ ๐Ÿจพ ๐Ÿจฟ ๐Ÿฉ€ ๐Ÿฉ ๐Ÿฉ‚ ๐Ÿฉƒ ๐Ÿฉ„ ๐Ÿฉ… ๐Ÿฉ† ๐Ÿฉ‡ ๐Ÿฉˆ ๐Ÿฉ‰ ๐ŸฉŠ ๐Ÿฉ‹ ๐ŸฉŒ ๐Ÿฉ ๐ŸฉŽ ๐Ÿฉ ๐Ÿฉ ๐Ÿฉ‘ ๐Ÿฉ’ ๐Ÿฉ“ ๐Ÿฉ  ๐Ÿฉก ๐Ÿฉข ๐Ÿฉฃ ๐Ÿฉค ๐Ÿฉฅ ๐Ÿฉฆ ๐Ÿฉง ๐Ÿฉจ ๐Ÿฉฉ ๐Ÿฉช ๐Ÿฉซ ๐Ÿฉฌ ๐Ÿฉญ
* refactor!: use utf8proc full casefoldingdundargoc2024-08-07
| | | | | | | | | | | | | | | | | | | | According to `CaseFolding-15.1.0.txt`, full casefolding should be preferred over simple casefolding as it's considered to be more correct. Since utf8proc already provides full casefolding it makes sense to switch to it. This will also remove a lot of unnecessary build code. Temporary exceptions are made for two sets characters: - `รŸ` will still be considered `รŸ` (instead of `ss`) as using a full casefolding requires interfering with upstream spell files in some form. - `ฤฐ` will still be considered `ฤฐ` (instead of `iฬ‡`) as using full casefolding requires making a value judgement on the "correct" behavior. There are two, equally valid case-insensetive comparison for this character according to unicode. It is essentially up to the implementor to decide which conversion is correct. For this reason it might make sense to allow users to decide which conversion should be done as an added option to `casemap` in a future PR.
* feat: update unicode tables (#27317)zeertzjq2024-02-04
|
* docs: fix/remove invalid URLs #20647Justin M. Keyes2022-10-14
|
* feat: update unicode tablesbfredl2022-09-30
|
* feat: update unicode tables #19135Justin M. Keyes2022-06-28
|
* build: move unicode/ to src/unicode/Justin M. Keyes2022-06-28