| Commit message (Collapse) | Author | Age |
|
|
| |
Result of `make iwyu` (after some "fixups").
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Option metadata like list of valid values for an option and
option flags are not listed in the `options.lua` file and are instead
manually defined in C, which means option metadata is split between
several places.
Solution: Put metadata such as list of valid values for an option and
option flags in `options.lua`, and autogenerate the corresponding C
variables and enums.
Supersedes #28659
Co-authored-by: glepnir <glephunter@gmail.com>
|
|
|
| |
Co-authored-by: zeertzjq <zeertzjq@outlook.com>
|
|
|
|
|
|
|
| |
while the implementation is not tied to screen chars, it is a reasonable
expectation to support the same size. If nvim is able to display a
multibyte character, it will accept the same character as input,
including in normal mode commands like r{char}
|
|
|
|
|
|
|
| |
Co-authored-by: David Pedersen <limero@me.com>
Co-authored-by: Gregory Anders <greg@gpanders.com>
Co-authored-by: Leo Schlosser <Leo.Schlosser@Student.HTW-Berlin.de>
Co-authored-by: zeertzjq <zeertzjq@outlook.com>
|
|
|
|
| |
fixes #30400
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
'fillchars' invalid (#30289)
Problem: Resetting cell widths can make 'listchars' or 'fillchars'
invalid.
Solution: Check for conflicts when resetting cell widths (zeertzjq).
closes: vim/vim#15629
https://github.com/vim/vim/commit/66f65a46c5d169f20f780721d4f74d4729855b96
|
|\
| |
| | |
fix(multibyte): handle backspace of wide clusters in replace mode
|
| |
| |
| |
| |
| | |
Make utf_head_off more robust against invalid sequences
and embedded NUL chars
|
|/
|
|
|
|
|
|
|
|
| |
Problem: resetting setcellwidth() doesn't update the screen
Solution: Redraw after clearing the cellwidth table (Ken Takata)
closes: vim/vim#15628
https://github.com/vim/vim/commit/539e9b571ae2a80dfa8a42eb132ad9f65f0bbcbc
Co-authored-by: Ken Takata <kentkt@csc.jp>
|
|
|
|
|
|
|
|
|
| |
Some sequences beginning with ASCII might be rendered as emoji, as for
instance emoji 1๏ธโฃ which is encoded as ascii 0x31 + U+FE0F + U+20E3.
While it is tricky to make the width of such sequences configurable,
we can make TUI be careful with such sequences and reset the cursor,
just like for Extended_Pictogram based sequences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit intentionally aims at preserving existing behavior as much
as possible while replacing our build step to convert unicode data
files into binary tables, which corresponding lookups in utf8proc.
Actual improvements in behavior will be a followup.
The only change in behavior is that 'emoji' option will turn some
more codepoints into double with. Nvim used the "Emoji" and
"Emoji_Presentation" properties to define emojis, while utf8proc
only exposes the Extended_Pictographic property from the emoji table.
This is a superset of the previous emoji properties. As only
codepoints above 0x1f000 are affected by the 'emoji' option, this means
that the following chars are now treated as double-width, instead of
single-width like in previous nvim versions:
๐ ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค
๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ
๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐
๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐
๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ
๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ ๐ ๐ ๐ฏ ๐
ฌ ๐
ญ ๐
ฎ ๐
ฏ ๐ญ ๐ข ๐ฃ ๐ ๐ ๐ ๐ ๐ ๐ฑ ๐ฒ ๐ถ ๐พ ๐ ๐ ๐ ๐ ๐จ ๐ฉ ๐ช ๐ซ
๐ฌ ๐ญ ๐ฎ ๐ฑ ๐ฒ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก
๐ข ๐ฃ ๐ฆ ๐ง ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ข ๐ค ๐ฅ ๐ฆ ๐ง ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฐ ๐ฑ ๐ฒ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐ฆ ๐ง ๐จ ๐ช ๐ฑ ๐ฒ ๐ด ๐ต ๐ถ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ขฐ ๐ขฑ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ
๐จ ๐จ ๐จ ๐จ ๐จ ๐จ
๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จ ๐จก ๐จข ๐จฃ ๐จค ๐จฅ ๐จฆ ๐จง ๐จจ ๐จฉ ๐จช ๐จซ ๐จฌ ๐จญ ๐จฎ ๐จฏ
๐จฐ ๐จฑ ๐จฒ ๐จณ ๐จด ๐จต ๐จถ ๐จท ๐จธ ๐จน ๐จบ ๐จป ๐จผ ๐จฝ ๐จพ ๐จฟ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ
๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ ๐ฉ
๐ฉ ๐ฉก ๐ฉข ๐ฉฃ ๐ฉค ๐ฉฅ ๐ฉฆ ๐ฉง ๐ฉจ ๐ฉฉ ๐ฉช ๐ฉซ ๐ฉฌ ๐ฉญ
|
|
|
|
|
|
|
|
|
| |
Use the grapheme break algorithm from utf8proc to support grapheme
clusters from recent unicode versions.
Handle variant selector VS16 turning some codepoints into double-width
emoji. This means we need to use ptr2cells rather than char2cells when
possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to `CaseFolding-15.1.0.txt`, full casefolding should be
preferred over simple casefolding as it's considered to be more correct.
Since utf8proc already provides full casefolding it makes sense to
switch to it. This will also remove a lot of unnecessary build code.
Temporary exceptions are made for two sets characters:
- `ร` will still be considered `ร` (instead of `ss`) as using a full
casefolding requires interfering with upstream spell files in some
form.
- `ฤฐ` will still be considered `ฤฐ` (instead of `iฬ`) as using full
casefolding requires making a value judgement on the "correct"
behavior. There are two, equally valid case-insensetive comparison for
this character according to unicode. It is essentially up to the
implementor to decide which conversion is correct. For this reason it
might make sense to allow users to decide which conversion should be
done as an added option to `casemap` in a future PR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
case-insensitive
Problem: regex: wrong match when searching multi-byte char
case-insensitive (diffsetter)
Solution: Apply proper case-folding for characters and search-string
This patch does the following 4 things:
1) When the regexp engine compares two utf-8 codepoints case
insensitive it may match an adjacent character, because it assumes
it can step over as many bytes as the pattern contains.
This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some
single-byte value.
Let's consider the pattern 'ลฟ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.
However in that case, it should only step over the single byte value
's' by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
* we try to match the correct length for the pattern and the text
* in case of a match, we step over it correctly
There is one tricky thing for the backtracing engine. We also need to
calculate correctly the number of bytes to compare the 2 different
utf-8 strings s1 and s2. So we will count the number of characters in
s1 that the byte len specified. Then we count the number of bytes to
step over the same number of characters in string s2 and then we can
correctly compare the 2 utf-8 strings.
2) A similar thing can happen for the NFA engine, when skipping to the
next character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over.
So this needs to be adjusted in find_match_text() as well.
3) A related issue turned out, when prog->match_text is actually empty.
In that case we should try to find the next match and skip this
condition.
4) When comparing characters using collections, we must also apply case
folding to each character in the collection and not just to the
current character from the search string. This doesn't apply to the
NFA engine, because internally it converts collections to branches
[abc] -> a\|b\|c
fixes: vim/vim#14294
closes: vim/vim#14756
https://github.com/vim/vim/commit/22e8e12d9f5034e1984db0c567b281fda4de8dd7
N/A patches:
vim-patch:9.0.1771: regex: combining chars in collections not handled
vim-patch:9.0.1777: patch 9.0.1771 causes problems
Co-authored-by: Christian Brabandt <cb@256bit.org>
|
|
|
|
|
|
| |
utf8proc already defines LATIN CAPITAL LETTER SHARP S (แบ) to be the
uppercase variant of LATIN SMALL LETTER SHARP S (ร), so this special
workaround when using `gU` is no longer needed on the neovim side.
|
|
|
|
|
| |
The comment "German sharp s is lower case but has no upper case
equivalent." is no longer true and is therefore not needed anymore.
|
| |
|
|
|
|
|
| |
revert: "refactor: use S_LEN(s) instead of s, n (#29219)"
This reverts commit c37695a5d5f2e8914fff86f3581bed70b4c85d3c.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Wrong cursor position after using setcellwidths().
Solution: Invalidate cursor position in addition to redrawing.
(zeertzjq)
closes: vim/vim#14545
https://github.com/vim/vim/commit/05aacec6ab5c7ed8a13bbdca2f0005d6a1816230
Reorder functions in test_utf8.vim to match upstream.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Patch 9.1.0296 causes too many issues
(Tony Mechelynck, chdiza, CI)
Solution: Back out the change for now
Revert "patch 9.1.0296: regexp: engines do not handle case-folding well"
This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes
issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It
needs more work.
fixes: vim/vim#14487
https://github.com/vim/vim/commit/c97f4d61cde24030f2f7d2318e1b409a0ccc3e43
Co-authored-by: Christian Brabandt <cb@256bit.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Regex engines do not handle case-folding well
Solution: Correctly calculate byte length of characters to skip
When the regexp engine compares two utf-8 codepoints case insensitively
it may match an adjacent character, because it assumes it can step over
as many bytes as the pattern contains.
This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some single-byte
value.
Let's consider the pattern 'ลฟ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.
However in that case, it should only step over the single byte
value 's' so by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
- we try to match the correct length for the pattern and the text
- in case of a match, we step over it correctly
The same thing can happen for the NFA engine, when skipping to the next
character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over. So
this needs to be adjusted in find_match_text() as well.
A related issue turned out, when prog->match_text is actually empty. In
that case we should try to find the next match and skip this condition.
fixes: vim/vim#14294
closes: vim/vim#14433
https://github.com/vim/vim/commit/7a27c108e0509f3255ebdcb6558e896c223e4d23
Co-authored-by: Christian Brabandt <cb@256bit.org>
|
|
|
|
|
|
|
|
|
|
| |
(#27636)
Problem: <Del> in cmdline mode doesn't delete composing chars
Solution: Use mb_head_off() and mb_ptr2len() (zeertzjq)
closes: vim/vim#14095
https://github.com/vim/vim/commit/ff2b79d23956263ab0120623c37e0b4498be01db
|
|
|
|
|
|
| |
Problems:
- Illegal bytes after valid UTF-8 char cause utf_cp_*_off() to fail.
- When stream isn't NUL-terminated, utf_cp_*_off() may go over the end.
Solution: Don't go over end of the char of end of the string.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: upper-case of ร should be U+1E9E (CAPITAL LETTER SHARP S)
(fenuks)
Solution: Make gU, ~ and g~ convert the U+00DF LATIN SMALL LETTER SHARP S (ร)
to U+1E9E LATIN CAPITAL LETTER SHARP S (แบ), update tests
(glepnir)
This is part of Unicode 5.1.0 from April 2008, so should be fairly safe
to use now and since 2017 is part of the German standard orthography,
according to Wikipedia:
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E#cite_note-auto-12
There is however one exception: UnicodeData.txt for U+00DF
LATIN SMALL LETTER SHARP S does NOT define U+1E9E LATIN CAPITAL LETTER
SHARP S as its upper case version. Therefore, toupper() won't be able
to convert from lower sharp s to upper case sharp s (the other way
around however works, since U+00DF is considered the lower case
character of U+1E9E and therefore tolower() works correctly for the
upper case version).
fixes: vim/vim#5573
closes: vim/vim#14018
https://github.com/vim/vim/commit/bd1232a1faf56b614a1e74c4ce51bc6e0650ae00
Co-authored-by: glepnir <glephunter@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: qsort() comparison functions should be transitive
Solution: Do not subtract values, but rather use explicit comparisons
Improve qsort() comparison functions
There has been a recent report on qsort() causing out-of-bounds read &
write in glibc for non transitive comparison functions
https://www.qualys.com/2024/01/30/qsort.txt
Even so the bug is in glibc's implementation of the qsort() algorithm,
it's bad style to just use substraction for the comparison functions,
which may cause overflow issues and as hinted at in OpenBSD's manual
page for qsort(): "It is almost always an error to use subtraction to
compute the return value of the comparison function."
So check the qsort() comparison functions and change them to be safe.
closes: vim/vim#13980
https://github.com/vim/vim/commit/e06e43766500ecb4cd1031fa16cf9cbebdb222c1
Co-authored-by: Christian Brabandt <cb@256bit.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`utf_char2cells()` calls `utf_printable()` twice (sometimes indirectly,
through `vim_isprintc()`) for characters >= 128. The function can be
refactored to call to it only once.
`utf_printable()` uses binary search on ranges of unprintable characters
to determine if a given character is printable. Since there are only 9
ranges, and the first range contains only one character, binary search
can be replaced with SSE2 SIMD comparisons that check 8 ranges at a
time, and the first range is checked separately. SSE2 is enabled by
default in GCC, Clang and MSVC for x86-64.
Add 3-byte utf-8 to screenpos_spec benchmarks.
|
|
|
| |
Co-authored-by: Matthieu Coudron <886074+teto@users.noreply.github.com>
|
|
|
|
|
|
|
|
| |
The optimized virtual column calculation loop in getvcol()
was decoding the current character twice: once in ptr2cells()
and the second time in utfc_ptr2len(). For combining charcters, they were
decoded up to 2 times in utfc_ptr2len(). Additionally, the function used to
decode the character could be further optimised.
|
|
|
|
|
|
| |
Remove `export` pramgas from defs headers as it causes IWYU to believe
that the definitions from the defs headers comes from main header, which
is not what we really want.
|
| |
|
|
|
|
| |
Reference: https://github.com/neovim/neovim/issues/6371.
|
|
|
|
|
| |
Specifically, specify that each initialization should be done on a
separate line.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Create mapping to most of the C spec and some POSIX specific functions.
This is more robust than relying files shipped with IWYU.
|
|
|
|
|
| |
- reduce variable scope
- prefer initialization over declaration and assignment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: buffer text with composing chars are converted from UTF-8
to an array of up to seven UTF-32 values and then converted back
to UTF-8 strings.
Solution: Convert buffer text directly to UTF-8 based schar_T values.
The limit of the text size is now in schar_T bytes, which is currently
31+1 but easily could be raised as it no longer multiplies the size
of the entire screen grid when not used, the full size is only required
for temporary scratch buffers.
Also does some general cleanup to win_line text handling, which was
unnecessarily complicated due to multibyte rendering being an "opt-in"
feature long ago. Nowadays, a char is just a char, regardless if it consists
of one ASCII byte or multiple bytes.
|
|
|
|
|
|
| |
- reduce variable scope
- prefer initialization over declaration and assignment
- use bool to represent boolean values
|
|
|
|
|
|
|
| |
We already have an extensive suite of static analysis tools we use,
which causes a fair bit of redundancy as we get duplicate warnings. PVS
is also prone to give false warnings which creates a lot of work to
identify and disable.
|
|
|
|
|
|
| |
long is 32 bits on windows, while it is 64 bits on other architectures.
This makes the type suboptimal for a codebase meant to be
cross-platform. Replace it with more appropriate integer types.
|