aboutsummaryrefslogtreecommitdiff
path: root/runtime/doc/mbyte.txt
diff options
context:
space:
mode:
Diffstat (limited to 'runtime/doc/mbyte.txt')
-rw-r--r--runtime/doc/mbyte.txt197
1 files changed, 53 insertions, 144 deletions
diff --git a/runtime/doc/mbyte.txt b/runtime/doc/mbyte.txt
index c87ed317d4..24d9d01af0 100644
--- a/runtime/doc/mbyte.txt
+++ b/runtime/doc/mbyte.txt
@@ -1,4 +1,4 @@
-*mbyte.txt* For Vim version 7.4. Last change: 2013 May 18
+*mbyte.txt* Nvim
VIM REFERENCE MANUAL by Bram Moolenaar et al.
@@ -14,26 +14,10 @@ For an introduction to the most common features, see |usr_45.txt| in the user
manual.
For changing the language of messages and menus see |mlang.txt|.
-{not available when compiled without the |+multi_byte| feature}
-
-
-1. Getting started |mbyte-first|
-2. Locale |mbyte-locale|
-3. Encoding |mbyte-encoding|
-4. Using a terminal |mbyte-terminal|
-5. Fonts on X11 |mbyte-fonts-X11|
-6. Fonts on MS-Windows |mbyte-fonts-MSwin|
-7. Input on X11 |mbyte-XIM|
-8. Input on MS-Windows |mbyte-IME|
-9. Input with a keymap |mbyte-keymap|
-10. Using UTF-8 |mbyte-utf8|
-11. Overview of options |mbyte-options|
-
-NOTE: This file contains UTF-8 characters. These may show up as strange
-characters or boxes when using another encoding.
+ Type |gO| to see the table of contents.
==============================================================================
-1. Getting started *mbyte-first*
+Getting started *mbyte-first*
This is a summary of the multibyte features in Vim. If you are lucky it works
as described and you can start using Vim without much trouble. If something
@@ -70,32 +54,26 @@ See |mbyte-locale| for details.
ENCODING
-If your locale works properly, Vim will try to set the 'encoding' option
-accordingly. If this doesn't work you can overrule its value: >
-
- :set encoding=utf-8
+Nvim always uses UTF-8 internally. Thus 'encoding' option is always set
+to "utf-8" and cannot be changed.
-See |encoding-values| for a list of acceptable values.
+All the text that is used inside Vim will be in UTF-8. Not only the text in
+the buffers, but also in registers, variables, etc.
-The result is that all the text that is used inside Vim will be in this
-encoding. Not only the text in the buffers, but also in registers, variables,
-etc. 'encoding' is read-only after startup because changing it would make the
-existing text invalid.
-
-You can edit files in another encoding than what 'encoding' is set to. Vim
+You can edit files in different encodings than UTF-8. Nvim
will convert the file when you read it and convert it back when you write it.
See 'fileencoding', 'fileencodings' and |++enc|.
DISPLAY AND FONTS
-If you are working in a terminal (emulator) you must make sure it accepts the
-same encoding as which Vim is working with.
+If you are working in a terminal (emulator) you must make sure it accepts
+UTF-8, the encoding which Vim is working with. Otherwise only ASCII can
+be displayed and edited correctly.
-For the GUI you must select fonts that work with the current 'encoding'. This
+For the GUI you must select fonts that work with UTF-8. This
is the difficult part. It depends on the system you are using, the locale and
-a few other things. See the chapters on fonts: |mbyte-fonts-X11| for
-X-Windows and |mbyte-fonts-MSwin| for MS-Windows.
+a few other things.
For X11 you can set the 'guifontset' option to a list of fonts that together
cover the characters that are used. Example for Korean: >
@@ -125,7 +103,7 @@ The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose
the different input methods or disable them temporarily.
==============================================================================
-2. Locale *mbyte-locale*
+Locale *mbyte-locale*
The easiest setup is when your whole system uses the locale you want to work
in. But it's also possible to set the locale for one shell you are working
@@ -214,12 +192,11 @@ Or specify $LANG when starting Vim:
You could make a small shell script for this.
==============================================================================
-3. Encoding *mbyte-encoding*
+Encoding *mbyte-encoding*
-Vim uses the 'encoding' option to specify how characters are identified and
-encoded when they are used inside Vim. This applies to all the places where
-text is used, including buffers (files loaded into memory), registers and
-variables.
+In Nvim UTF-8 is always used internally to encode characters.
+ This applies to all the places where text is used, including buffers (files
+ loaded into memory), registers and variables.
*charset* *codeset*
Charset is another name for encoding. There are subtle differences, but these
@@ -240,7 +217,7 @@ matter what language is used. Thus you might see the right text even when the
encoding was set wrong.
*encoding-names*
-Vim can use many different character encodings. There are three major groups:
+Vim can edit files in different character encodings. There are three major groups:
1 8bit Single-byte encodings, 256 different characters. Mostly used
in USA and Europe. Example: ISO-8859-1 (Latin1). All
@@ -255,11 +232,10 @@ u Unicode Universal encoding, can replace all others. ISO 10646.
Millions of different characters. Example: UTF-8. The
relation between bytes and screen cells is complex.
-Other encodings cannot be used by Vim internally. But files in other
+Only UTF-8 is used by Vim internally. But files in other
encodings can be edited by using conversion, see 'fileencoding'.
-Note that all encodings must use ASCII for the characters up to 128.
-Supported 'encoding' values are: *encoding-values*
+Recognized 'fileencoding' values include: *encoding-values*
1 latin1 8-bit characters (ISO 8859-1, also used for cp1252)
1 iso-8859-n ISO_8859 variant (n = 2 to 15)
1 koi8-r Russian
@@ -311,11 +287,11 @@ u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
u ucs-4le like ucs-4, little endian
The {name} can be any encoding name that your system supports. It is passed
-to iconv() to convert between the encoding of the file and the current locale.
+to iconv() to convert between UTF-8 and the encoding of the file.
For MS-Windows "cp{number}" means using codepage {number}.
Examples: >
- :set encoding=8bit-cp1252
- :set encoding=2byte-cp932
+ :set fileencoding=8bit-cp1252
+ :set fileencoding=2byte-cp932
The MS-Windows codepage 1252 is very similar to latin1. For practical reasons
the same encoding is used and it's called latin1. 'isprint' can be used to
@@ -337,8 +313,7 @@ u ucs-2be same as ucs-2 (big endian)
u ucs-4be same as ucs-4 (big endian)
u utf-32 same as ucs-4
u utf-32le same as ucs-4le
- default stands for the default value of 'encoding', depends on the
- environment
+ default the encoding of the current locale.
For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever
you can. The default is to use big-endian (most significant byte comes
@@ -363,13 +338,12 @@ or when conversion is not possible:
CONVERSION *charset-conversion*
Vim will automatically convert from one to another encoding in several places:
-- When reading a file and 'fileencoding' is different from 'encoding'
-- When writing a file and 'fileencoding' is different from 'encoding'
+- When reading a file and 'fileencoding' is different from "utf-8"
+- When writing a file and 'fileencoding' is different from "utf-8"
- When displaying messages and the encoding used for LC_MESSAGES differs from
- 'encoding' (requires a gettext version that supports this).
+ "utf-8" (requires a gettext version that supports this).
- When reading a Vim script where |:scriptencoding| is different from
- 'encoding'.
-- When reading or writing a |shada| file.
+ "utf-8".
Most of these require the |+iconv| feature. Conversion for reading and
writing files may also be specified with the 'charconvert' option.
@@ -408,11 +382,11 @@ Useful utilities for converting the charset:
*mbyte-conversion*
-When reading and writing files in an encoding different from 'encoding',
+When reading and writing files in an encoding different from "utf-8",
conversion needs to be done. These conversions are supported:
- All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are
handled internally.
-- For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and
+- For MS-Windows, conversion from and
to any codepage should work.
- Conversion specified with 'charconvert'
- Conversion with the iconv library, if it is available.
@@ -427,51 +401,7 @@ neither of them can be found Vim will still work but some conversions won't be
possible.
==============================================================================
-4. Using a terminal *mbyte-terminal*
-
-The GUI fully supports multi-byte characters. It is also possible in a
-terminal, if the terminal supports the same encoding that Vim uses. Thus this
-is less flexible.
-
-For example, you can run Vim in a xterm with added multi-byte support and/or
-|XIM|. Examples are kterm (Kanji term) and hanterm (for Korean), Eterm
-(Enlightened terminal) and rxvt.
-
-UTF-8 IN XFREE86 XTERM *UTF8-xterm*
-
-This is a short explanation of how to use UTF-8 character encoding in the
-xterm that comes with XFree86 by Thomas Dickey (text by Markus Kuhn).
-
-Get the latest xterm version which has now UTF-8 support:
-
- http://invisible-island.net/xterm/xterm.html
-
-Compile it with "./configure --enable-wide-chars ; make"
-
-Also get the ISO 10646-1 version of various fonts, which is available on
-
- http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz
-
-and install the font as described in the README file.
-
-Now start xterm with >
-
- xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
-or, for bigger character: >
- xterm -u8 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1
-
-and you will have a working UTF-8 terminal emulator. Try both >
-
- cat utf-8-demo.txt
- vim utf-8-demo.txt
-
-with the demo text that comes with ucs-fonts.tar.gz in order to see
-whether there are any problems with UTF-8 in your xterm.
-
-For Vim you may need to set 'encoding' to "utf-8".
-
-==============================================================================
-5. Fonts on X11 *mbyte-fonts-X11*
+Fonts on X11 *mbyte-fonts-X11*
Unfortunately, using fonts in X11 is complicated. The name of a single-byte
font is a long string. For multi-byte fonts we need several of these...
@@ -607,20 +537,7 @@ Also make sure that you set 'guifontset' before setting fonts for highlight
groups.
==============================================================================
-6. Fonts on MS-Windows *mbyte-fonts-MSwin*
-
-The simplest is to use the font dialog to select fonts and try them out. You
-can find this at the "Edit/Select Font..." menu. Once you find a font name
-that works well you can use this command to see its name: >
-
- :set guifont
-
-Then add a command to your |ginit.vim| file to set 'guifont': >
-
- :set guifont=courier_new:h12
-
-==============================================================================
-7. Input on X11 *mbyte-XIM*
+Input on X11 *mbyte-XIM*
X INPUT METHOD (XIM) BACKGROUND *XIM* *xim* *x-input-method*
@@ -779,7 +696,7 @@ For example, when you are using kinput2 as |IM-server| and sh, >
<
==============================================================================
-8. Input on MS-Windows *mbyte-IME*
+Input on MS-Windows *mbyte-IME*
(Windows IME support) *multibyte-ime* *IME*
@@ -853,22 +770,23 @@ Cursor color when IME or XIM is on *CursorIM*
status is on.
==============================================================================
-9. Input with a keymap *mbyte-keymap*
+Input with a keymap *mbyte-keymap*
When the keyboard doesn't produce the characters you want to enter in your
text, you can use the 'keymap' option. This will translate one or more
(English) characters to another (non-English) character. This only happens
when typing text, not when typing Vim commands. This avoids having to switch
between two keyboard settings.
+{only available when compiled with the |+keymap| feature}
The value of the 'keymap' option specifies a keymap file to use. The name of
this file is one of these two:
- keymap/{keymap}_{encoding}.vim
+ keymap/{keymap}_utf-8.vim
keymap/{keymap}.vim
-Here {keymap} is the value of the 'keymap' option and {encoding} of the
-'encoding' option. The file name with the {encoding} included is tried first.
+Here {keymap} is the value of the 'keymap' option.
+The file name with "utf-8" included is tried first.
'runtimepath' is used to find these files. To see an overview of all
available keymap files, use this: >
@@ -916,7 +834,7 @@ keyboards and encodings.
The actual mappings are in the lines below "loadkeymap". In the example "a"
is mapped to "A" and "b" to "B". Thus the first item is mapped to the second
item. This is done for each line, until the end of the file.
-These items are exactly the same as what can be used in a |:lnoremap| command,
+These items are exactly the same as what can be used in a |:lmap| command,
using "<buffer>" to make the mappings local to the buffer.
You can check the result with this command: >
:lmap
@@ -931,8 +849,9 @@ Since Vim doesn't know if the next character after a quote is really an "a",
it will wait for the next character. To be able to insert a single quote,
also add this line: >
'' '
-Since the mapping is defined with |:lnoremap| the resulting quote will not be
-used for the start of another character.
+Since the mapping is defined with |:lmap| the resulting quote will not be
+used for the start of another character defined in the 'keymap'.
+It can be used in a standard |:imap| mapping.
The "accents" keymap uses this. *keymap-accents*
The first column can also be in |<>| form:
@@ -950,7 +869,7 @@ this is unusual. But you can use various ways to specify the character: >
A <char-0141> octal value
x <Space> special key name
-The characters are assumed to be encoded for the current value of 'encoding'.
+The characters are assumed to be encoded in UTF-8.
It's possible to use ":scriptencoding" when all characters are given
literally. That doesn't work when using the <char-> construct, because the
conversion is done on the keymap file, not on the resulting character.
@@ -1100,7 +1019,7 @@ Combining forms:
ﭏ 0xfb4f Xal alef-lamed
==============================================================================
-10. Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8*
+Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8*
*Unicode* *unicode*
The Unicode character set was designed to include all characters from other
character sets. Therefore it is possible to write text in any language using
@@ -1140,8 +1059,7 @@ widespread as file format.
A composing or combining character is used to change the meaning of the
character before it. The combining characters are drawn on top of the
preceding character.
-Up to two combining characters can be used by default. This can be changed
-with the 'maxcombine' option.
+Up to six combining characters can be displayed.
When editing text a composing character is mostly considered part of the
preceding character. For example "x" will delete a character and its
following composing characters by default.
@@ -1170,21 +1088,13 @@ Useful commands:
message is truncated, use ":messages").
- "g8" shows the bytes used in a UTF-8 character, also the composing
characters, as hex numbers.
-- ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The
- default is to use the current locale for 'encoding' and set 'fileencodings'
- to automatically detect the encoding of a file.
+- ":set fileencodings=" forces using UTF-8 for all files. The
+ default is to automatically detect the encoding of a file.
STARTING VIM
-If your current locale is in an utf-8 encoding, Vim will automatically start
-in utf-8 mode.
-
-If you are using another locale: >
-
- set encoding=utf-8
-
-You might also want to select the font used for the menus. Unfortunately this
+You might want to select the font used for the menus. Unfortunately this
doesn't always work. See the system specific remarks below, and 'langmenu'.
@@ -1240,15 +1150,14 @@ not everybody is able to type a composing character.
==============================================================================
-11. Overview of options *mbyte-options*
+Overview of options *mbyte-options*
These options are relevant for editing multi-byte files. Check the help in
options.txt for detailed information.
-'encoding' Encoding used for the keyboard and display. It is also the
- default encoding for files.
+'encoding' Internal text encoding, always "utf-8".
-'fileencoding' Encoding of a file. When it's different from 'encoding'
+'fileencoding' Encoding of a file. When it's different from "utf-8"
conversion is done when reading or writing the file.
'fileencodings' List of possible encodings of a file. When opening a file
@@ -1276,4 +1185,4 @@ Contributions specifically for the multi-byte features by:
Taro Muraoka <koron@tka.att.ne.jp>
Yasuhiro Matsumoto <mattn@mail.goo.ne.jp>
- vim:tw=78:ts=8:ft=help:norl:
+ vim:tw=78:ts=8:noet:ft=help:norl: