aboutsummaryrefslogtreecommitdiff
path: root/src/nvim/regexp.c
Commit message (Collapse)AuthorAge
* refactor: iwyu #31637Justin M. Keyes2024-12-23
| | | Result of `make iwyu` (after some "fixups").
* feat(mbyte): support extended grapheme clusters including more emojibfredl2024-08-30
| | | | | | | | | Use the grapheme break algorithm from utf8proc to support grapheme clusters from recent unicode versions. Handle variant selector VS16 turning some codepoints into double-width emoji. This means we need to use ptr2cells rather than char2cells when possible.
* fix(regexp): fix typo in E888 error message (#30161)Eisuke Kawashima2024-08-28
| | | Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
* vim-patch:9.0.0634: evaluating "expr" options has more overhead than neededzeertzjq2024-08-02
| | | | | | | | | | | | | | | | | | Problem: Evaluating "expr" options has more overhead than needed. Solution: Use call_simple_func() for 'foldtext', 'includeexpr', 'printexpr', "expr" of 'spellsuggest', 'diffexpr', 'patchexpr', 'balloonexpr', 'formatexpr', 'indentexpr' and 'charconvert'. https://github.com/vim/vim/commit/a4e0b9785e409e9e660171cea76dfcc5fdafad9b vim-patch:9.0.0635: build error and compiler warnings Problem: Build error and compiler warnings. Solution: Add missing change. Add type casts. https://github.com/vim/vim/commit/3292a229402c9892f5ab90645fbfe2b1db342f5b Co-authored-by: Bram Moolenaar <Bram@vim.org>
* vim-patch:9.1.0650: Coverity warning in cstrncmp() (#29944)zeertzjq2024-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Coverity warning in cstrncmp() (after v9.1.0645) Solution: Change the type of n2 to int. (zeertzjq) ________________________________________________________________________________________________________ *** CID 1615684: Integer handling issues (INTEGER_OVERFLOW) /src/regexp.c: 1757 in cstrncmp() 1751 n1 -= mb_ptr2len(s1); 1752 MB_PTR_ADV(p); 1753 n2++; 1754 } 1755 // count the number of bytes to advance the same number of chars for s2 1756 p = s2; >>> CID 1615684: Integer handling issues (INTEGER_OVERFLOW) >>> Expression "n2--", which is equal to 18446744073709551615, where "n2" is known to be equal to 0, underflows the type that receives it, an unsigned integer 64 bits wide. 1757 while (n2-- > 0 && *p != NUL) 1758 MB_PTR_ADV(p); 1759 1760 n2 = p - s2; 1761 1762 result = MB_STRNICMP2(s1, s2, *n, n2); closes: vim/vim#15409 https://github.com/vim/vim/commit/e8feaa354e685e527198093904492f67c52c2302
* vim-patch:9.1.0645: regex: wrong match when searching multi-byte char ↵zeertzjq2024-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | case-insensitive Problem: regex: wrong match when searching multi-byte char case-insensitive (diffsetter) Solution: Apply proper case-folding for characters and search-string This patch does the following 4 things: 1) When the regexp engine compares two utf-8 codepoints case insensitive it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: * we try to match the correct length for the pattern and the text * in case of a match, we step over it correctly There is one tricky thing for the backtracing engine. We also need to calculate correctly the number of bytes to compare the 2 different utf-8 strings s1 and s2. So we will count the number of characters in s1 that the byte len specified. Then we count the number of bytes to step over the same number of characters in string s2 and then we can correctly compare the 2 utf-8 strings. 2) A similar thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. 3) A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. 4) When comparing characters using collections, we must also apply case folding to each character in the collection and not just to the current character from the search string. This doesn't apply to the NFA engine, because internally it converts collections to branches [abc] -> a\|b\|c fixes: vim/vim#14294 closes: vim/vim#14756 https://github.com/vim/vim/commit/22e8e12d9f5034e1984db0c567b281fda4de8dd7 N/A patches: vim-patch:9.0.1771: regex: combining chars in collections not handled vim-patch:9.0.1777: patch 9.0.1771 causes problems Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.0.0105: illegal memory access when pattern starts with illegal bytezeertzjq2024-07-31
| | | | | | | | | Problem: Illegal memory access when pattern starts with illegal byte. Solution: Do not match a character with an illegal byte. https://github.com/vim/vim/commit/f50940531dd57135fe60aa393ac9d3281f352d88 Co-authored-by: Bram Moolenaar <Bram@vim.org>
* vim-patch:9.0.0414: matchstr() still does not match column offsetzeertzjq2024-07-17
| | | | | | | | | | | Problem: matchstr() still does not match column offset when done after a text search. Solution: Only use the line number for a multi-line search. Fix the test. (closes vim/vim#10938) https://github.com/vim/vim/commit/753aead960f163d0d3f8ce523ea523f2e0cec06d Co-authored-by: Bram Moolenaar <Bram@vim.org>
* vim-patch:9.0.0407: matchstr() does match column offsetzeertzjq2024-07-17
| | | | | | | | | Problem: matchstr() does match column offset. (Yasuhiro Matsumoto) Solution: Accept line number zero. (closes vim/vim#10938) https://github.com/vim/vim/commit/75a115e8d632e96b4f45dc5145ba261876a83dcf Co-authored-by: Bram Moolenaar <Bram@vim.org>
* vim-patch:9.0.0228: crash when pattern looks below the last linezeertzjq2024-07-17
| | | | | | | | | | | Problem: Crash when pattern looks below the last line. Solution: Consider invalid lines to be empty. (closes vim/vim#10938) https://github.com/vim/vim/commit/13ed494bb5edc5a02d0ed0feabddb68920f88570 Comment out the test as it uses Vim9 script and text properties. Co-authored-by: Bram Moolenaar <Bram@vim.org>
* revert: "refactor: use S_LEN macro" (#29319)Lewis Russell2024-06-14
| | | | | revert: "refactor: use S_LEN(s) instead of s, n (#29219)" This reverts commit c37695a5d5f2e8914fff86f3581bed70b4c85d3c.
* Merge pull request #29278 from bfredl/strcatbfredl2024-06-11
|\ | | | | refactor(memory): use builtin strcat() instead of STRCAT()
| * refactor(memory): use builtin strcat() instead of STRCAT()bfredl2024-06-11
| | | | | | | | | | | | | | | | The latter was mostly relevant with the past char_u madness. NOTE: STRCAT also functioned as a counterfeit "NOLINT" for clint apparently. But NOLINT-ing every usecase is just the same as disabling the check entirely.
* | refactor: use S_LEN(s) instead of s, n (#29219)James2024-06-11
|/
* refactor: replace '\0' with NULJames Tirta Halim2024-06-04
|
* refactor: move shared messages to errors.h #26214Justin M. Keyes2024-06-01
|
* vim-patch:9.1.0438: Wrong Ex command executed when :g uses '?' as delimiter ↵zeertzjq2024-05-24
| | | | | | | | | | | (#28956) Problem: Wrong Ex command executed when :g uses '?' as delimiter and pattern contains escaped '?'. Solution: Don't use "*newp" when it's not allocated (zeertzjq). closes: vim/vim#14837 https://github.com/vim/vim/commit/3074137542961ce7b3b65c14ebde75f13f5e6147
* vim-patch:9.1.0436: Crash when using '?' as separator for :s (#28955)zeertzjq2024-05-24
| | | | | | | | | | Problem: Crash when using '?' as separator for :s and pattern contains escaped '?'s (after 9.1.0409). Solution: Always compute startplen. (zeertzjq). related: neovim/neovim#28935 closes: 14832 https://github.com/vim/vim/commit/789679cfc4f39505b135220672b43a260d8ca3b4
* vim-patch:9.1.0409: too many strlen() calls in the regexp engine (#28857)zeertzjq2024-05-20
| | | | | | | | | | | | | | | | | Problem: too many strlen() calls in the regexp engine Solution: refactor code to retrieve strlen differently, make use of bsearch() for getting the character class (John Marriott) closes: vim/vim#14648 https://github.com/vim/vim/commit/82792db6315f7c7b0e299cdde1566f2932a463f8 Cherry-pick keyvalue_T and its comparison functions from patch 9.1.0256. vim-patch:9.1.0410: warning about uninitialized variable vim-patch:9.1.0412: typo in regexp_bt.c in DEBUG code Co-authored-by: John Marriott <basilisk@internode.on.net>
* docs: misc (#28609)dundargoc2024-05-15
| | | | | | | | | | | | Closes https://github.com/neovim/neovim/issues/28484. Closes https://github.com/neovim/neovim/issues/28719. Co-authored-by: Chris <crwebb85@gmail.com> Co-authored-by: Gregory Anders <greg@gpanders.com> Co-authored-by: Jake B <16889000+jakethedev@users.noreply.github.com> Co-authored-by: Jonathan Raines <jonathan.s.raines@gmail.com> Co-authored-by: Yi Ming <ofseed@foxmail.com> Co-authored-by: Zane Dufour <zane@znd4.me> Co-authored-by: zeertzjq <zeertzjq@outlook.com>
* refactor: add xmemcpyz() and use it in place of some xstrlcpy() (#28422)zeertzjq2024-04-20
| | | | | | Problem: Using xstrlcpy() when the exact length of the string to be copied is known is not ideal because it requires adding 1 to the length and an unnecessary strlen(). Solution: Add xmemcpyz() and use it in place of such xstrlcpy() calls.
* refactor: fix clang NonNullParamChecker warnings (#28327)zeertzjq2024-04-14
|
* vim-patch:9.1.0297: Patch 9.1.0296 causes too many issues (#28263)zeertzjq2024-04-11
| | | | | | | | | | | | | | | | | Problem: Patch 9.1.0296 causes too many issues (Tony Mechelynck, chdiza, CI) Solution: Back out the change for now Revert "patch 9.1.0296: regexp: engines do not handle case-folding well" This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It needs more work. fixes: vim/vim#14487 https://github.com/vim/vim/commit/c97f4d61cde24030f2f7d2318e1b409a0ccc3e43 Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.1.0296: regexp: engines do not handle case-folding well (#28259)zeertzjq2024-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Regex engines do not handle case-folding well Solution: Correctly calculate byte length of characters to skip When the regexp engine compares two utf-8 codepoints case insensitively it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' so by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: - we try to match the correct length for the pattern and the text - in case of a match, we step over it correctly The same thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. fixes: vim/vim#14294 closes: vim/vim#14433 https://github.com/vim/vim/commit/7a27c108e0509f3255ebdcb6558e896c223e4d23 Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.1.0217: regexp: verymagic cannot match before/after a mark (#28074)zeertzjq2024-03-28
| | | | | | | | | | | | | | Problem: regexp: verymagic cannot match before/after a mark Solution: Correctly check for the very magic check (Julio B) Fix regexp parser for \v%>'m and \v%<'m Currently \v%'m works fine, but it is unable to match before or after the position of mark m. closes: vim/vim#14309 https://github.com/vim/vim/commit/46fa3c7e271eb2abb05a0d9e6dbc9c36c2b2da02 Co-authored-by: Julio B <julio.bacel@gmail.com>
* vim-patch:9.1.0105: Style: typos found (#27462)zeertzjq2024-02-14
| | | | | | | | | Problem: Style: typos found Solution: correct them (zeertzjq) closes: vim/vim#14023 https://github.com/vim/vim/commit/e71022082d6a8bd8ec3d7b9dadf3f9ce46ef339c
* vim-patch:9.1.0043: ml_get: invalid lnum when :s replaces visual selection ↵zeertzjq2024-01-23
| | | | | | | | | | | | | | | | (#27140) Problem: ml_get: invalid lnum when :s replaces visual selection (@ropery) Solution: substitute may decrement the number of lines in a buffer, so validate, that the bottom lines of the visual selection stays within the max buffer line fixes: vim/vim#13890 closes: vim/vim#13892 https://github.com/vim/vim/commit/7c71db3a58f658b4329b82ab603efa928d17bdbc Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.1.0011: regexp cannot match combining chars in collection (#26992)zeertzjq2024-01-12
| | | | | | | | | | | | | | Problem: regexp cannot match combining chars in collection Solution: Check for combining characters in regex collections for the NFA and BT Regex Engine Also, while at it, make debug mode work again. fixes vim/vim#10286 closes: vim/vim#12871 https://github.com/vim/vim/commit/d2cc51f9a1a5a30ef5d2e732f49d7f495cae24cf Co-authored-by: Christian Brabandt <cb@256bit.org>
* refactor(IWYU): fix headersdundargoc2024-01-11
| | | | | | Remove `export` pramgas from defs headers as it causes IWYU to believe that the definitions from the defs headers comes from main header, which is not what we really want.
* refactor: move structs from regexp_defs.h to regexp.c (#26899)zeertzjq2024-01-05
| | | | These structs are only used in other source files as pointers and their fields aren't accessed in other source files.
* refactor: remove redundant NOLINT commentsdundargoc2024-01-01
|
* refactor: remove os_errmsg and os_msg functionsdundargoc2023-12-23
| | | | Instead replace them with fprintf and printf.
* refactor: run IWYU on entire repodundargoc2023-12-21
| | | | Reference: https://github.com/neovim/neovim/issues/6371.
* build: don't define FUNC_ATTR_* as empty in headers (#26317)zeertzjq2023-11-30
| | | | | | FUNC_ATTR_* should only be used in .c files with generated headers. Defining FUNC_ATTR_* as empty in headers causes misuses of them to be silently ignored. Instead don't define them by default, and only define them as empty after a .c file has included its generated header.
* refactor: fix headers with IWYUdundargoc2023-11-28
|
* refactor: rename types.h to types_defs.hdundargoc2023-11-27
|
* build(IWYU): fix includes for undo_defs.hdundargoc2023-11-27
|
* build(IWYU): fix includes for func_attr.hdundargoc2023-11-27
|
* build: rework IWYU mapping filesdundargoc2023-11-25
| | | | | Create mapping to most of the C spec and some POSIX specific functions. This is more robust than relying files shipped with IWYU.
* build: adjust clang-tidy warning exclusion logicdundargoc2023-11-20
| | | | | | | Enable all clang-tidy warnings by default instead of disabling them. This ensures that we don't miss useful warnings on each clang-tidy version upgrade. A drawback of this is that it will force us to either fix or adjust the warnings as soon as possible.
* refactor: enable formatting for ternariesdundargoc2023-11-20
| | | | | | This requires removing the "Inner expression should be aligned" rule from clint as it prevents essentially any formatting regarding ternary operators.
* build: bump uncrustify versiondundargoc2023-11-19
| | | | Biggest change is that uncrustify is silent during linting.
* vim-patch:9.0.2107: [security]: FPE in adjust_plines_for_skipcol (#26082)zeertzjq2023-11-17
| | | | | | | | | | | | | Problem: [security]: FPE in adjust_plines_for_skipcol Solution: don't divide by zero, return zero Prevent a floating point exception when calculating w_skipcol (which can happen with a small window when the number option is set and cpo+=n). Add a test to verify https://github.com/vim/vim/commit/cb0b99f0672d8446585d26e998343dceca17d1ce Co-authored-by: Christian Brabandt <cb@256bit.org>
* vim-patch:9.0.1532: crash when expanding "~" in substitute causes very long textzeertzjq2023-11-17
| | | | | | | | | Problem: Crash when expanding "~" in substitute causes very long text. Solution: Limit the text length to MAXCOL. https://github.com/vim/vim/commit/ab9a2d884b3a4abe319606ea95a5a6d6b01cd73a Co-authored-by: Bram Moolenaar <Bram@vim.org>
* refactor: iwyu (#26062)zeertzjq2023-11-16
|
* build: remove PVSdundargoc2023-11-12
| | | | | | | We already have an extensive suite of static analysis tools we use, which causes a fair bit of redundancy as we get duplicate warnings. PVS is also prone to give false warnings which creates a lot of work to identify and disable.
* refactor: remove redundant castsdundargoc2023-11-11
|
* refactor: the long goodbyedundargoc2023-11-05
| | | | | | long is 32 bits on windows, while it is 64 bits on other architectures. This makes the type suboptimal for a codebase meant to be cross-platform. Replace it with more appropriate integer types.
* refactor: combine regexp filesdundargoc2023-11-05
| | | | | | regext_bt.c and regexp_nfa.c are inlined into regexp.c instead of included as a header. This makes developer tools like clang-tidy and clangd be able to understand the code better.
* refactor: the long goodbyedundargoc2023-10-09
| | | | | | long is 32 bits on windows, while it is 64 bits on other architectures. This makes the type suboptimal for a codebase meant to be cross-platform. Replace it with more appropriate integer types.