aboutsummaryrefslogtreecommitdiff
path: root/src/nvim/mbyte.c
Commit message (Collapse)AuthorAge
* refactor: iwyu #31637Justin M. Keyes2024-12-23
| | | Result of `make iwyu` (after some "fixups").
* refactor(options): autogenerate valid values and flag enums for options (#31089)Famiu Haque2024-11-23
| | | | | | | | | | | | | | Problem: Option metadata like list of valid values for an option and option flags are not listed in the `options.lua` file and are instead manually defined in C, which means option metadata is split between several places. Solution: Put metadata such as list of valid values for an option and option flags in `options.lua`, and autogenerate the corresponding C variables and enums. Supersedes #28659 Co-authored-by: glepnir <glephunter@gmail.com>
* docs: misc (#31138)dundargoc2024-11-21
| | | Co-authored-by: zeertzjq <zeertzjq@outlook.com>
* feat(editor): handle new multibyte sequences in normal mode replacementbfredl2024-11-04
| | | | | | | while the implementation is not tied to screen chars, it is a reasonable expectation to support the same size. If nvim is able to display a multibyte character, it will accept the same character as input, including in normal mode commands like r{char}
* docs: miscdundargoc2024-10-23
| | | | | | | Co-authored-by: David Pedersen <limero@me.com> Co-authored-by: Gregory Anders <greg@gpanders.com> Co-authored-by: Leo Schlosser <Leo.Schlosser@Student.HTW-Berlin.de> Co-authored-by: zeertzjq <zeertzjq@outlook.com>
* refactor(multibyte): neo-casefolding without allocationbfredl2024-09-29
| | | | fixes #30400
* fix(mbyte): check for utf8proc_map() failure (#30531)zeertzjq2024-09-26
|
* vim-patch:9.1.0719: Resetting cell widths can make 'listchars' or โ†ตzeertzjq2024-09-06
| | | | | | | | | | | 'fillchars' invalid (#30289) Problem: Resetting cell widths can make 'listchars' or 'fillchars' invalid. Solution: Check for conflicts when resetting cell widths (zeertzjq). closes: vim/vim#15629 https://github.com/vim/vim/commit/66f65a46c5d169f20f780721d4f74d4729855b96
* Merge pull request #30272 from bfredl/replace_emojibfredl2024-09-06
|\ | | | | fix(multibyte): handle backspace of wide clusters in replace mode
| * fix(multibyte): handle backspace of wide clusters in replace modebfredl2024-09-06
| | | | | | | | | | Make utf_head_off more robust against invalid sequences and embedded NUL chars
* | vim-patch:9.1.0716: resetting setcellwidth() doesn't update the screen (#30274)zeertzjq2024-09-06
|/ | | | | | | | | | Problem: resetting setcellwidth() doesn't update the screen Solution: Redraw after clearing the cellwidth table (Ken Takata) closes: vim/vim#15628 https://github.com/vim/vim/commit/539e9b571ae2a80dfa8a42eb132ad9f65f0bbcbc Co-authored-by: Ken Takata <kentkt@csc.jp>
* fix(mbyte): mark any 0xFE0F sequence as a TUI ambiguous width charbfredl2024-09-02
| | | | | | | | | Some sequences beginning with ASCII might be rendered as emoji, as for instance emoji 1๏ธโƒฃ which is encoded as ascii 0x31 + U+FE0F + U+20E3. While it is tricky to make the width of such sequences configurable, we can make TUI be careful with such sequences and reset the cursor, just like for Extended_Pictogram based sequences.
* refactor(multibyte): replace generated unicode tables with utf8procbfredl2024-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit intentionally aims at preserving existing behavior as much as possible while replacing our build step to convert unicode data files into binary tables, which corresponding lookups in utf8proc. Actual improvements in behavior will be a followup. The only change in behavior is that 'emoji' option will turn some more codepoints into double with. Nvim used the "Emoji" and "Emoji_Presentation" properties to define emojis, while utf8proc only exposes the Extended_Pictographic property from the emoji table. This is a superset of the previous emoji properties. As only codepoints above 0x1f000 are affected by the 'emoji' option, this means that the following chars are now treated as double-width, instead of single-width like in previous nvim versions: ๐Ÿ€€ ๐Ÿ€ ๐Ÿ€‚ ๐Ÿ€ƒ ๐Ÿ€… ๐Ÿ€† ๐Ÿ€‡ ๐Ÿ€ˆ ๐Ÿ€‰ ๐Ÿ€Š ๐Ÿ€‹ ๐Ÿ€Œ ๐Ÿ€ ๐Ÿ€Ž ๐Ÿ€ ๐Ÿ€ ๐Ÿ€‘ ๐Ÿ€’ ๐Ÿ€“ ๐Ÿ€” ๐Ÿ€• ๐Ÿ€– ๐Ÿ€— ๐Ÿ€˜ ๐Ÿ€™ ๐Ÿ€š ๐Ÿ€› ๐Ÿ€œ ๐Ÿ€ ๐Ÿ€ž ๐Ÿ€Ÿ ๐Ÿ€  ๐Ÿ€ก ๐Ÿ€ข ๐Ÿ€ฃ ๐Ÿ€ค ๐Ÿ€ฅ ๐Ÿ€ฆ ๐Ÿ€ง ๐Ÿ€จ ๐Ÿ€ฉ ๐Ÿ€ช ๐Ÿ€ซ ๐Ÿ€ฐ ๐Ÿ€ฑ ๐Ÿ€ฒ ๐Ÿ€ณ ๐Ÿ€ด ๐Ÿ€ต ๐Ÿ€ถ ๐Ÿ€ท ๐Ÿ€ธ ๐Ÿ€น ๐Ÿ€บ ๐Ÿ€ป ๐Ÿ€ผ ๐Ÿ€ฝ ๐Ÿ€พ ๐Ÿ€ฟ ๐Ÿ€ ๐Ÿ ๐Ÿ‚ ๐Ÿƒ ๐Ÿ„ ๐Ÿ… ๐Ÿ† ๐Ÿ‡ ๐Ÿˆ ๐Ÿ‰ ๐ŸŠ ๐Ÿ‹ ๐ŸŒ ๐Ÿ ๐ŸŽ ๐Ÿ ๐Ÿ ๐Ÿ‘ ๐Ÿ’ ๐Ÿ“ ๐Ÿ” ๐Ÿ• ๐Ÿ– ๐Ÿ— ๐Ÿ˜ ๐Ÿ™ ๐Ÿš ๐Ÿ› ๐Ÿœ ๐Ÿ ๐Ÿž ๐ŸŸ ๐Ÿ  ๐Ÿก ๐Ÿข ๐Ÿฃ ๐Ÿค ๐Ÿฅ ๐Ÿฆ ๐Ÿง ๐Ÿจ ๐Ÿฉ ๐Ÿช ๐Ÿซ ๐Ÿฌ ๐Ÿญ ๐Ÿฎ ๐Ÿฏ ๐Ÿฐ ๐Ÿฑ ๐Ÿฒ ๐Ÿณ ๐Ÿด ๐Ÿต ๐Ÿถ ๐Ÿท ๐Ÿธ ๐Ÿน ๐Ÿบ ๐Ÿป ๐Ÿผ ๐Ÿฝ ๐Ÿพ ๐Ÿฟ ๐Ÿ‚€ ๐Ÿ‚ ๐Ÿ‚‚ ๐Ÿ‚ƒ ๐Ÿ‚„ ๐Ÿ‚… ๐Ÿ‚† ๐Ÿ‚‡ ๐Ÿ‚ˆ ๐Ÿ‚‰ ๐Ÿ‚Š ๐Ÿ‚‹ ๐Ÿ‚Œ ๐Ÿ‚ ๐Ÿ‚Ž ๐Ÿ‚ ๐Ÿ‚ ๐Ÿ‚‘ ๐Ÿ‚’ ๐Ÿ‚“ ๐Ÿ‚  ๐Ÿ‚ก ๐Ÿ‚ข ๐Ÿ‚ฃ ๐Ÿ‚ค ๐Ÿ‚ฅ ๐Ÿ‚ฆ ๐Ÿ‚ง ๐Ÿ‚จ ๐Ÿ‚ฉ ๐Ÿ‚ช ๐Ÿ‚ซ ๐Ÿ‚ฌ ๐Ÿ‚ญ ๐Ÿ‚ฎ ๐Ÿ‚ฑ ๐Ÿ‚ฒ ๐Ÿ‚ณ ๐Ÿ‚ด ๐Ÿ‚ต ๐Ÿ‚ถ ๐Ÿ‚ท ๐Ÿ‚ธ ๐Ÿ‚น ๐Ÿ‚บ ๐Ÿ‚ป ๐Ÿ‚ผ ๐Ÿ‚ฝ ๐Ÿ‚พ ๐Ÿ‚ฟ ๐Ÿƒ ๐Ÿƒ‚ ๐Ÿƒƒ ๐Ÿƒ„ ๐Ÿƒ… ๐Ÿƒ† ๐Ÿƒ‡ ๐Ÿƒˆ ๐Ÿƒ‰ ๐ŸƒŠ ๐Ÿƒ‹ ๐ŸƒŒ ๐Ÿƒ ๐ŸƒŽ ๐Ÿƒ‘ ๐Ÿƒ’ ๐Ÿƒ“ ๐Ÿƒ” ๐Ÿƒ• ๐Ÿƒ– ๐Ÿƒ— ๐Ÿƒ˜ ๐Ÿƒ™ ๐Ÿƒš ๐Ÿƒ› ๐Ÿƒœ ๐Ÿƒ ๐Ÿƒž ๐ŸƒŸ ๐Ÿƒ  ๐Ÿƒก ๐Ÿƒข ๐Ÿƒฃ ๐Ÿƒค ๐Ÿƒฅ ๐Ÿƒฆ ๐Ÿƒง ๐Ÿƒจ ๐Ÿƒฉ ๐Ÿƒช ๐Ÿƒซ ๐Ÿƒฌ ๐Ÿƒญ ๐Ÿƒฎ ๐Ÿƒฏ ๐Ÿƒฐ ๐Ÿƒฑ ๐Ÿƒฒ ๐Ÿƒณ ๐Ÿƒด ๐Ÿƒต ๐Ÿ„ ๐Ÿ„Ž ๐Ÿ„ ๐Ÿ„ฏ ๐Ÿ…ฌ ๐Ÿ…ญ ๐Ÿ…ฎ ๐Ÿ…ฏ ๐Ÿ†ญ ๐ŸŒข ๐ŸŒฃ ๐ŸŽ” ๐ŸŽ• ๐ŸŽ˜ ๐ŸŽœ ๐ŸŽ ๐Ÿฑ ๐Ÿฒ ๐Ÿถ ๐Ÿ“พ ๐Ÿ•† ๐Ÿ•‡ ๐Ÿ•ˆ ๐Ÿ• ๐Ÿ•จ ๐Ÿ•ฉ ๐Ÿ•ช ๐Ÿ•ซ ๐Ÿ•ฌ ๐Ÿ•ญ ๐Ÿ•ฎ ๐Ÿ•ฑ ๐Ÿ•ฒ ๐Ÿ•ป ๐Ÿ•ผ ๐Ÿ•ฝ ๐Ÿ•พ ๐Ÿ•ฟ ๐Ÿ–€ ๐Ÿ– ๐Ÿ–‚ ๐Ÿ–ƒ ๐Ÿ–„ ๐Ÿ–… ๐Ÿ–† ๐Ÿ–ˆ ๐Ÿ–‰ ๐Ÿ–Ž ๐Ÿ– ๐Ÿ–‘ ๐Ÿ–’ ๐Ÿ–“ ๐Ÿ–” ๐Ÿ–— ๐Ÿ–˜ ๐Ÿ–™ ๐Ÿ–š ๐Ÿ–› ๐Ÿ–œ ๐Ÿ– ๐Ÿ–ž ๐Ÿ–Ÿ ๐Ÿ–  ๐Ÿ–ก ๐Ÿ–ข ๐Ÿ–ฃ ๐Ÿ–ฆ ๐Ÿ–ง ๐Ÿ–ฉ ๐Ÿ–ช ๐Ÿ–ซ ๐Ÿ–ฌ ๐Ÿ–ญ ๐Ÿ–ฎ ๐Ÿ–ฏ ๐Ÿ–ฐ ๐Ÿ–ณ ๐Ÿ–ด ๐Ÿ–ต ๐Ÿ–ถ ๐Ÿ–ท ๐Ÿ–ธ ๐Ÿ–น ๐Ÿ–บ ๐Ÿ–ป ๐Ÿ–ฝ ๐Ÿ–พ ๐Ÿ–ฟ ๐Ÿ—€ ๐Ÿ— ๐Ÿ—… ๐Ÿ—† ๐Ÿ—‡ ๐Ÿ—ˆ ๐Ÿ—‰ ๐Ÿ—Š ๐Ÿ—‹ ๐Ÿ—Œ ๐Ÿ— ๐Ÿ—Ž ๐Ÿ— ๐Ÿ— ๐Ÿ—” ๐Ÿ—• ๐Ÿ—– ๐Ÿ—— ๐Ÿ—˜ ๐Ÿ—™ ๐Ÿ—š ๐Ÿ—› ๐Ÿ—Ÿ ๐Ÿ—  ๐Ÿ—ข ๐Ÿ—ค ๐Ÿ—ฅ ๐Ÿ—ฆ ๐Ÿ—ง ๐Ÿ—ฉ ๐Ÿ—ช ๐Ÿ—ซ ๐Ÿ—ฌ ๐Ÿ—ญ ๐Ÿ—ฎ ๐Ÿ—ฐ ๐Ÿ—ฑ ๐Ÿ—ฒ ๐Ÿ—ด ๐Ÿ—ต ๐Ÿ—ถ ๐Ÿ—ท ๐Ÿ—ธ ๐Ÿ—น ๐Ÿ›† ๐Ÿ›‡ ๐Ÿ›ˆ ๐Ÿ›‰ ๐Ÿ›Š ๐Ÿ›“ ๐Ÿ›” ๐Ÿ›ฆ ๐Ÿ›ง ๐Ÿ›จ ๐Ÿ›ช ๐Ÿ›ฑ ๐Ÿ›ฒ ๐Ÿด ๐Ÿต ๐Ÿถ ๐Ÿป ๐Ÿผ ๐Ÿฝ ๐Ÿพ ๐Ÿฟ ๐ŸŸ• ๐ŸŸ– ๐ŸŸ— ๐ŸŸ˜ ๐ŸŸ™ ๐Ÿขฐ ๐Ÿขฑ ๐Ÿจ€ ๐Ÿจ ๐Ÿจ‚ ๐Ÿจƒ ๐Ÿจ„ ๐Ÿจ… ๐Ÿจ† ๐Ÿจ‡ ๐Ÿจˆ ๐Ÿจ‰ ๐ŸจŠ ๐Ÿจ‹ ๐ŸจŒ ๐Ÿจ ๐ŸจŽ ๐Ÿจ ๐Ÿจ ๐Ÿจ‘ ๐Ÿจ’ ๐Ÿจ“ ๐Ÿจ” ๐Ÿจ• ๐Ÿจ– ๐Ÿจ— ๐Ÿจ˜ ๐Ÿจ™ ๐Ÿจš ๐Ÿจ› ๐Ÿจœ ๐Ÿจ ๐Ÿจž ๐ŸจŸ ๐Ÿจ  ๐Ÿจก ๐Ÿจข ๐Ÿจฃ ๐Ÿจค ๐Ÿจฅ ๐Ÿจฆ ๐Ÿจง ๐Ÿจจ ๐Ÿจฉ ๐Ÿจช ๐Ÿจซ ๐Ÿจฌ ๐Ÿจญ ๐Ÿจฎ ๐Ÿจฏ ๐Ÿจฐ ๐Ÿจฑ ๐Ÿจฒ ๐Ÿจณ ๐Ÿจด ๐Ÿจต ๐Ÿจถ ๐Ÿจท ๐Ÿจธ ๐Ÿจน ๐Ÿจบ ๐Ÿจป ๐Ÿจผ ๐Ÿจฝ ๐Ÿจพ ๐Ÿจฟ ๐Ÿฉ€ ๐Ÿฉ ๐Ÿฉ‚ ๐Ÿฉƒ ๐Ÿฉ„ ๐Ÿฉ… ๐Ÿฉ† ๐Ÿฉ‡ ๐Ÿฉˆ ๐Ÿฉ‰ ๐ŸฉŠ ๐Ÿฉ‹ ๐ŸฉŒ ๐Ÿฉ ๐ŸฉŽ ๐Ÿฉ ๐Ÿฉ ๐Ÿฉ‘ ๐Ÿฉ’ ๐Ÿฉ“ ๐Ÿฉ  ๐Ÿฉก ๐Ÿฉข ๐Ÿฉฃ ๐Ÿฉค ๐Ÿฉฅ ๐Ÿฉฆ ๐Ÿฉง ๐Ÿฉจ ๐Ÿฉฉ ๐Ÿฉช ๐Ÿฉซ ๐Ÿฉฌ ๐Ÿฉญ
* feat(mbyte): support extended grapheme clusters including more emojibfredl2024-08-30
| | | | | | | | | Use the grapheme break algorithm from utf8proc to support grapheme clusters from recent unicode versions. Handle variant selector VS16 turning some codepoints into double-width emoji. This means we need to use ptr2cells rather than char2cells when possible.
* refactor!: use utf8proc full casefoldingdundargoc2024-08-07
| | | | | | | | | | | | | | | | | | | | According to `CaseFolding-15.1.0.txt`, full casefolding should be preferred over simple casefolding as it's considered to be more correct. Since utf8proc already provides full casefolding it makes sense to switch to it. This will also remove a lot of unnecessary build code. Temporary exceptions are made for two sets characters: - `รŸ` will still be considered `รŸ` (instead of `ss`) as using a full casefolding requires interfering with upstream spell files in some form. - `ฤฐ` will still be considered `ฤฐ` (instead of `iฬ‡`) as using full casefolding requires making a value judgement on the "correct" behavior. There are two, equally valid case-insensetive comparison for this character according to unicode. It is essentially up to the implementor to decide which conversion is correct. For this reason it might make sense to allow users to decide which conversion should be done as an added option to `casemap` in a future PR.
* vim-patch:9.1.0645: regex: wrong match when searching multi-byte char โ†ตzeertzjq2024-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | case-insensitive Problem: regex: wrong match when searching multi-byte char case-insensitive (diffsetter) Solution: Apply proper case-folding for characters and search-string This patch does the following 4 things: 1) When the regexp engine compares two utf-8 codepoints case insensitive it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ลฟ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: * we try to match the correct length for the pattern and the text * in case of a match, we step over it correctly There is one tricky thing for the backtracing engine. We also need to calculate correctly the number of bytes to compare the 2 different utf-8 strings s1 and s2. So we will count the number of characters in s1 that the byte len specified. Then we count the number of bytes to step over the same number of characters in string s2 and then we can correctly compare the 2 utf-8 strings. 2) A similar thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. 3) A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. 4) When comparing characters using collections, we must also apply case folding to each character in the collection and not just to the current character from the search string. This doesn't apply to the NFA engine, because internally it converts collections to branches [abc] -> a\|b\|c fixes: vim/vim#14294 closes: vim/vim#14756 https://github.com/vim/vim/commit/22e8e12d9f5034e1984db0c567b281fda4de8dd7 N/A patches: vim-patch:9.0.1771: regex: combining chars in collections not handled vim-patch:9.0.1777: patch 9.0.1771 causes problems Co-authored-by: Christian Brabandt <cb@256bit.org>
* refactor: remove special handling for lowercase German sharp sdundargoc2024-06-29
| | | | | | utf8proc already defines LATIN CAPITAL LETTER SHARP S (แบž) to be the uppercase variant of LATIN SMALL LETTER SHARP S (รŸ), so this special workaround when using `gU` is no longer needed on the neovim side.
* refactor: remove special-case conversion for german sharp sdundargoc2024-06-29
| | | | | The comment "German sharp s is lower case but has no upper case equivalent." is no longer true and is therefore not needed anymore.
* refactor: replace utf_convert with utf8proc conversion functionsdundargoc2024-06-28
|
* revert: "refactor: use S_LEN macro" (#29319)Lewis Russell2024-06-14
| | | | | revert: "refactor: use S_LEN(s) instead of s, n (#29219)" This reverts commit c37695a5d5f2e8914fff86f3581bed70b4c85d3c.
* refactor: use S_LEN(s) instead of s, n (#29219)James2024-06-11
|
* fixup: LNULJames Tirta Halim2024-06-04
|
* refactor: replace '\0' with NULJames Tirta Halim2024-06-04
|
* refactor: move shared messages to errors.h #26214Justin M. Keyes2024-06-01
|
* vim-patch:9.1.0320: Wrong cursor position after using setcellwidths() (#28334)zeertzjq2024-04-15
| | | | | | | | | | | Problem: Wrong cursor position after using setcellwidths(). Solution: Invalidate cursor position in addition to redrawing. (zeertzjq) closes: vim/vim#14545 https://github.com/vim/vim/commit/05aacec6ab5c7ed8a13bbdca2f0005d6a1816230 Reorder functions in test_utf8.vim to match upstream.
* vim-patch:9.1.0297: Patch 9.1.0296 causes too many issues (#28263)zeertzjq2024-04-11
| | | | | | | | | | | | | | | | | Problem: Patch 9.1.0296 causes too many issues (Tony Mechelynck, chdiza, CI) Solution: Back out the change for now Revert "patch 9.1.0296: regexp: engines do not handle case-folding well" This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It needs more work. fixes: vim/vim#14487 https://github.com/vim/vim/commit/c97f4d61cde24030f2f7d2318e1b409a0ccc3e43 Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.1.0296: regexp: engines do not handle case-folding well (#28259)zeertzjq2024-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Regex engines do not handle case-folding well Solution: Correctly calculate byte length of characters to skip When the regexp engine compares two utf-8 codepoints case insensitively it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ลฟ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' so by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: - we try to match the correct length for the pattern and the text - in case of a match, we step over it correctly The same thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. fixes: vim/vim#14294 closes: vim/vim#14433 https://github.com/vim/vim/commit/7a27c108e0509f3255ebdcb6558e896c223e4d23 Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.1.0137: <Del> in cmdline mode doesn't delete composing chars โ†ตzeertzjq2024-02-27
| | | | | | | | | | (#27636) Problem: <Del> in cmdline mode doesn't delete composing chars Solution: Use mb_head_off() and mb_ptr2len() (zeertzjq) closes: vim/vim#14095 https://github.com/vim/vim/commit/ff2b79d23956263ab0120623c37e0b4498be01db
* fix(mbyte): fix bugs in utf_cp_*_off() functionsVanaIgr2024-02-26
| | | | | | Problems: - Illegal bytes after valid UTF-8 char cause utf_cp_*_off() to fail. - When stream isn't NUL-terminated, utf_cp_*_off() may go over the end. Solution: Don't go over end of the char of end of the string.
* vim-patch:9.1.0101: upper-case of German sharp s should be U+1E9E (#27449)zeertzjq2024-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: upper-case of รŸ should be U+1E9E (CAPITAL LETTER SHARP S) (fenuks) Solution: Make gU, ~ and g~ convert the U+00DF LATIN SMALL LETTER SHARP S (รŸ) to U+1E9E LATIN CAPITAL LETTER SHARP S (แบž), update tests (glepnir) This is part of Unicode 5.1.0 from April 2008, so should be fairly safe to use now and since 2017 is part of the German standard orthography, according to Wikipedia: https://en.wikipedia.org/wiki/Capital_%E1%BA%9E#cite_note-auto-12 There is however one exception: UnicodeData.txt for U+00DF LATIN SMALL LETTER SHARP S does NOT define U+1E9E LATIN CAPITAL LETTER SHARP S as its upper case version. Therefore, toupper() won't be able to convert from lower sharp s to upper case sharp s (the other way around however works, since U+00DF is considered the lower case character of U+1E9E and therefore tolower() works correctly for the upper case version). fixes: vim/vim#5573 closes: vim/vim#14018 https://github.com/vim/vim/commit/bd1232a1faf56b614a1e74c4ce51bc6e0650ae00 Co-authored-by: glepnir <glephunter@gmail.com>
* vim-patch:9.1.0089: qsort() comparison functions should be transitivezeertzjq2024-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: qsort() comparison functions should be transitive Solution: Do not subtract values, but rather use explicit comparisons Improve qsort() comparison functions There has been a recent report on qsort() causing out-of-bounds read & write in glibc for non transitive comparison functions https://www.qualys.com/2024/01/30/qsort.txt Even so the bug is in glibc's implementation of the qsort() algorithm, it's bad style to just use substraction for the comparison functions, which may cause overflow issues and as hinted at in OpenBSD's manual page for qsort(): "It is almost always an error to use subtraction to compute the return value of the comparison function." So check the qsort() comparison functions and change them to be safe. closes: vim/vim#13980 https://github.com/vim/vim/commit/e06e43766500ecb4cd1031fa16cf9cbebdb222c1 Co-authored-by: Christian Brabandt <cb@256bit.org>
* perf: improve utf_char2cells() performance (#27353)VanaIgr2024-02-07
| | | | | | | | | | | | | | `utf_char2cells()` calls `utf_printable()` twice (sometimes indirectly, through `vim_isprintc()`) for characters >= 128. The function can be refactored to call to it only once. `utf_printable()` uses binary search on ranges of unprintable characters to determine if a given character is printable. Since there are only 9 ranges, and the first range contains only one character, binary search can be replaced with SSE2 SIMD comparisons that check 8 ranges at a time, and the first range is checked separately. SSE2 is enabled by default in GCC, Clang and MSVC for x86-64. Add 3-byte utf-8 to screenpos_spec benchmarks.
* docs: small fixes (#27213)dundargoc2024-02-06
| | | Co-authored-by: Matthieu Coudron <886074+teto@users.noreply.github.com>
* perf: don't decode utf8 character multiple times in getvcol()VanaIgr2024-01-22
| | | | | | | | The optimized virtual column calculation loop in getvcol() was decoding the current character twice: once in ptr2cells() and the second time in utfc_ptr2len(). For combining charcters, they were decoded up to 2 times in utfc_ptr2len(). Additionally, the function used to decode the character could be further optimised.
* refactor(IWYU): fix headersdundargoc2024-01-11
| | | | | | Remove `export` pramgas from defs headers as it causes IWYU to believe that the definitions from the defs headers comes from main header, which is not what we really want.
* refactor: follow style guidedundargoc2023-12-30
|
* refactor: run IWYU on entire repodundargoc2023-12-21
| | | | Reference: https://github.com/neovim/neovim/issues/6371.
* docs: add style rule regarding initializationdundargoc2023-12-18
| | | | | Specifically, specify that each initialization should be done on a separate line.
* refactor: fix headers with IWYUdundargoc2023-11-28
|
* refactor: iwyu (#26269)zeertzjq2023-11-28
|
* refactor: rename types.h to types_defs.hdundargoc2023-11-27
|
* build(IWYU): fix includes for undo_defs.hdundargoc2023-11-27
|
* build: enable IWYU on macdundargoc2023-11-27
|
* build(IWYU): map everything in the C99 specificationdundargoc2023-11-26
|
* build: rework IWYU mapping filesdundargoc2023-11-25
| | | | | Create mapping to most of the C spec and some POSIX specific functions. This is more robust than relying files shipped with IWYU.
* refactor: follow style guidedundargoc2023-11-19
| | | | | - reduce variable scope - prefer initialization over declaration and assignment
* refactor(grid): make screen rendering more multibyte than ever beforebfredl2023-11-17
| | | | | | | | | | | | | | | | | | Problem: buffer text with composing chars are converted from UTF-8 to an array of up to seven UTF-32 values and then converted back to UTF-8 strings. Solution: Convert buffer text directly to UTF-8 based schar_T values. The limit of the text size is now in schar_T bytes, which is currently 31+1 but easily could be raised as it no longer multiplies the size of the entire screen grid when not used, the full size is only required for temporary scratch buffers. Also does some general cleanup to win_line text handling, which was unnecessarily complicated due to multibyte rendering being an "opt-in" feature long ago. Nowadays, a char is just a char, regardless if it consists of one ASCII byte or multiple bytes.
* refactor: follow style guidedundargoc2023-11-13
| | | | | | - reduce variable scope - prefer initialization over declaration and assignment - use bool to represent boolean values
* build: remove PVSdundargoc2023-11-12
| | | | | | | We already have an extensive suite of static analysis tools we use, which causes a fair bit of redundancy as we get duplicate warnings. PVS is also prone to give false warnings which creates a lot of work to identify and disable.
* refactor: the long goodbyedundargoc2023-11-05
| | | | | | long is 32 bits on windows, while it is 64 bits on other architectures. This makes the type suboptimal for a codebase meant to be cross-platform. Replace it with more appropriate integer types.