From f61efe3fe77c9a517dccb9fd5ff7f16c0660ced4 Mon Sep 17 00:00:00 2001 From: Evgeni Chasnovski Date: Thu, 18 Jul 2024 18:26:27 +0300 Subject: perf(filetype): implement parent pattern pre-matching (#29660) Problem: calling `vim.filetype.match()` has performance bottleneck in that it has to match a lot of Lua patterns against several versions of input file name. This might be the problem if users need to call it synchronously a lot of times. Solution: add "parent pattern pre-matching" which can be used to quickly reject several potential pattern matches at (usually rare) cost of adding time for one extra Lua pattern match. "Parent pattern" is a manually added/tracked grouping of filetype patterns which should have two properties: - Match at least the same set of strings as its filetype patterns. But not too much more. - Be fast to match. For them to be effective, group should consist from at least three filetype patterns. Example: for a filetpye pattern ".*/etc/a2ps/.*%.cfg", both "/etc/" and "%.cfg" are good parent patterns (prefer the one which can group more filetype patterns). After this commit, `vim.filetype.match()` on most inputs runs ~3.4 times faster (while some inputs may see less impact if they match many parent patterns). --- runtime/doc/dev_vimpatch.txt | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) (limited to 'runtime/doc') diff --git a/runtime/doc/dev_vimpatch.txt b/runtime/doc/dev_vimpatch.txt index d6e4ced054..6d450424c5 100644 --- a/runtime/doc/dev_vimpatch.txt +++ b/runtime/doc/dev_vimpatch.txt @@ -302,4 +302,40 @@ used in new documentation: - `{Only when compiled with ...}`: the vast majority of features have been made non-optional (see https://github.com/neovim/neovim/wiki/Introduction) +============================================================================== +FILETYPE DETECTION *dev-vimpatch-filetype* + +Nvim's filetype detection behavior matches Vim, but is implemented as part of +|vim.filetype| (see $VIMRUNTIME/lua/vim/filetype.lua). + +Pattern matching has several differences: +- It is done using explicit Lua patterns (without implicit anchoring) instead + of Vim regexes: > + "*/debian/changelog" -> "/debian/changelog$" + "*/bind/db.*" -> "/bind/db%." +< +- Filetype patterns are grouped by their parent pattern to improve matching + performance. For this to work properly, parent pattern should: + - Match at least the same set of strings as filetype patterns inside it. + But not too much more. + - Be fast to match. + + When adding a new filetype with pattern matching, consider the following: + - If there is already a group with appropriate parent pattern, use it. + - If there can be a fast and specific enough pattern to group at least + 3 filetype patterns, add it as a separate grouped entry. + + Good new parent pattern should be: + - Fast. Good rule of thumb is that it should be a short explicit string + (i.e. no quantifiers or character sets). + - Specific. Good rules of thumb (from best to worst): + - Full directory name (like "/etc/", "/log/"). + - Part of a rare enough directory name (like "/conf", "git/"). + - String reasonably rarely used in real full paths (like "nginx"). + + Example: + - Filetype pattern: ".*/etc/a2ps/.*%.cfg" + - Good parent: "/etc/"; "%.cfg$" + - Bad parent: "%." - fast, not specific; "/a2ps/.*%." - slow, specific + vim:tw=78:ts=8:noet:ft=help:norl: -- cgit