From cfdf68a7acde16597fbd896674af68c42361102c Mon Sep 17 00:00:00 2001 From: bfredl Date: Thu, 8 Aug 2024 10:42:08 +0200 Subject: feat(mbyte): support extended grapheme clusters including more emoji Use the grapheme break algorithm from utf8proc to support grapheme clusters from recent unicode versions. Handle variant selector VS16 turning some codepoints into double-width emoji. This means we need to use ptr2cells rather than char2cells when possible. --- runtime/doc/mbyte.txt | 6 ++++++ runtime/doc/news.txt | 6 ++++++ runtime/doc/options.txt | 9 ++++++--- runtime/lua/vim/_meta/options.lua | 9 ++++++--- 4 files changed, 24 insertions(+), 6 deletions(-) (limited to 'runtime') diff --git a/runtime/doc/mbyte.txt b/runtime/doc/mbyte.txt index a8c5670352..47fd4f3343 100644 --- a/runtime/doc/mbyte.txt +++ b/runtime/doc/mbyte.txt @@ -646,6 +646,12 @@ widespread as file format. A composing or combining character is used to change the meaning of the character before it. The combining characters are drawn on top of the preceding character. + +Nvim largely follows the definition of extended grapheme clusters in UAX#29 +in the Unicode standard, with some modifications: An ascii char will always +start a new cluster. In addition 'arabicshape' enables the combining of some +arabic letters, when they are shaped to be displayed together in a single cell. + Too big combined characters cannot be displayed, but they can still be inspected using the |g8| and |ga| commands described below. When editing text a composing character is mostly considered part of the diff --git a/runtime/doc/news.txt b/runtime/doc/news.txt index 80511ccb87..b7e1e0c84f 100644 --- a/runtime/doc/news.txt +++ b/runtime/doc/news.txt @@ -200,6 +200,12 @@ These existing features changed their behavior. top lines are calculated using screen line numbers which take virtual lines into account. +• The implementation of grapheme clusters (or combining chars |mbyte-combining|) + was upgraded to closely follow extended grapheme clusters as defined by UAX#29 + in the unicode standard. Noteworthily, this enables proper display of many + more emoji characters than before, including those encoded with multiple + emoji codepoints combined with ZWJ (zero width joiner) codepoints. + ============================================================================== REMOVED FEATURES *news-removed* diff --git a/runtime/doc/options.txt b/runtime/doc/options.txt index f44e0954a5..4945a1b46d 100644 --- a/runtime/doc/options.txt +++ b/runtime/doc/options.txt @@ -2217,9 +2217,12 @@ A jump table for the options with a short description can be found at |Q_op|. global When on all Unicode emoji characters are considered to be full width. This excludes "text emoji" characters, which are normally displayed as - single width. Unfortunately there is no good specification for this - and it has been determined on trial-and-error basis. Use the - |setcellwidths()| function to change the behavior. + single width. However, such "text emoji" are treated as full-width + emoji if they are followed by the U+FE0F variant selector. + + Unfortunately there is no good specification for this and it has been + determined on trial-and-error basis. Use the |setcellwidths()| + function to change the behavior. *'encoding'* *'enc'* 'encoding' 'enc' string (default "utf-8") diff --git a/runtime/lua/vim/_meta/options.lua b/runtime/lua/vim/_meta/options.lua index b4ac478b61..05c9b89d77 100644 --- a/runtime/lua/vim/_meta/options.lua +++ b/runtime/lua/vim/_meta/options.lua @@ -1829,9 +1829,12 @@ vim.go.ead = vim.go.eadirection --- When on all Unicode emoji characters are considered to be full width. --- This excludes "text emoji" characters, which are normally displayed as ---- single width. Unfortunately there is no good specification for this ---- and it has been determined on trial-and-error basis. Use the ---- `setcellwidths()` function to change the behavior. +--- single width. However, such "text emoji" are treated as full-width +--- emoji if they are followed by the U+FE0F variant selector. +--- +--- Unfortunately there is no good specification for this and it has been +--- determined on trial-and-error basis. Use the `setcellwidths()` +--- function to change the behavior. --- --- @type boolean vim.o.emoji = true -- cgit