It’s again a boring day for you and your colleagues at the offices of MegaCorpMoneyMaker, the company you work for. Davina, your colleague developer, seems to be fascinated by a tree you can see through a small window; curious, you ask her what she’s thinking about.
“I was considering configuring Tree-sitter for the Best Editor In The Universeā¢!”, she answers with sudden determination. Wondering what was this Tree-sitter she’s speaking about, but too afraid to show your lack of knowledge, you answer that sitting on a tree has nothing to do with editors, and she should go see a doctor. She looks at you with amusement, before pulling you suddenly to her computer. “Tree-sitter is a library able to create a parse tree from some source code. Neovim can use it to implement different functionalities, like syntax highlighting for example.”
Surely, something fishy is going on. Syntax highlighting exists since the dawn of humanity! There’s only one possible explanation: Davina finally lost her mind after too much boredom.
“Why on Earth this Tree-sitter thing’s better than the usual syntax highlighting?” you ask, as innocent as a baby from the womb.
“I will show you!”, answer Davina. This article is the result of her explanation, captured by The Old Gods to enlighten the ones who want to get The Knowledge. The author is only the messenger.
We’ll see, more specifically:
- Why using Tree-sitter at the first place.
- How Tree-sitter works.
- How to enable syntax highlighting using Tree-sitter in Neovim.
- How to customize a color scheme for this syntax highlighting.
- What are the plugins which can help you use Tree-sitter for other functionalities.
There’s also a companion repository containing all the files we’ll use in this article.
Are you ready to climb up the tree?
Why Using Tree-sitter?
Tree-sitter is a small C library used to parse some source code (often a file at a time). Because it’s fast and small, it can be easily embedded into text editors or IDEs to answer all their parsing needs: syntax highlighting, code analysis, or incremental selection for example.
Before Tree-sitter, your beautiful source code was parsed using some sort of regex engine in most editors. But there are many problems with this solution: it’s potentially slow and inaccurate.
Regex engines, depending on their implementation and the functionalities they offer, can be indeed slow. For example, your editor might need to parse the entire source code each time you modify it, decreasing further the performances. Regexes can also be inaccurate in many situations: it’s difficult to parse nested constructs, for example.
On top of that, your regexes can be very different from one language to another; look at the syntax between C++ and Clojure for example. You’ll need a different set of regexes for each.
Tree-sitter tries to solve these problems by implementing the following:
- Tree-sitter can parse your source code and spit up a parse tree, also called CST (Concrete Syntax Tree). You’ll need one parser per programming language, but the CST itself is the same for every single one of them. It brings needed consistency, making the development of functionalities using Tree-sitter easier.
- The Tree-sitter library was optimized for speed.
- The parsers are often faster than a bunch of regexes. But it might not be the case: it entirely depends on their implementation.
- If you modify your source code, the parser will only parse what was modified, and only updates the section of the CST which changed. This is called incremental parsing, making Tree-sitter even faster.
- Using a tree is more accurate than a bunch of regexes, correctly identifying tokens even when the source code is quite complex.
Now that we saw why Tree-sitter is better than a bunch of regexes, let’s get closer to this beast and look at how it really works.
How Tree-sitter Works
Grammar File
To parse our source code and create a CST, we first need a specific parser for the programming language of our choice. A parser can be generated from a grammar file grammar.json
(or grammar.js
), describing all the constructs of the language, and how to write them in term of other constructs. The Tree-sitter CLI gives you the command “generate”, taking the grammar file as input to output the parser itself (a C file).
This CLI is a dependency for Neovim, so you should already have it on your system.
Here’s a small extract of a grammar file for the Lua programming language:
"if_statement": {
"type": "SEQ",
"members": [
{
"type": "STRING",
"value": "if"
},
...
You can find the complete file in the companion repository
The parser created from the grammar file can then be used to parse any file containing source code of the chosen programming language (Lua here), and create the CST we all desire.
Parsers
Without going too much into the nitty-gritty, the parser includes a lexer and an array mapping a given token to an action. For example, if the parser find an if
in the source code, it should create an if_statement
node in the CST. Each node of the tree include the token, as well as its beginning and end in the source code.
Two algorithms are used to parse this source code: LR and GLR parsing. You can look at this paper if you want to know more about them.
Queries
It’s great to have a tree, but how can we use it? Let’s take syntax highlighting to illustrate the process. If we want all the conditionals of our source code (if
, else
, and the like) to be red, we need to:
- Query the tree to find all the nodes we want.
- Capture these nodes.
- Map the capture to the color of our choice (red here).
The LISP lovers out there will be glad to learn that these queries use a simple language based on S-expressions; it’s basically a syntax relying heavily on parentheses.
Queries are written in files with extension “scm”, like source code written in Scheme (a LISP dialect). Don’t be fooled: this query language has little in common with Scheme (except maybe the use of S-expressions in both cases).
For example, let’s look at the file highlight.scm, gathering queries for highlighting the programming language Lua. Here’s one query:
(if_statement
[
"if"
"elseif"
"else"
"then"
"end"
] @conditional)
Again, you can find the complete file in the companion repository.
This query look at conditionals (the tokens if
, elseif
, else
, and so on) in a CST created from a Lua source code, and capture them under the name @conditional
. In Neovim, we can then use this capture to highlight these conditionals.
Most of the time, each specific feature of Neovim using Tree-sitter (like syntax highlighting) will use a different set of queries.
To summarize the process:
Syntax Highlighting with Tree-sitter and Neovim
Now that we understand better how Tree-sitter works, let’s dive into a concrete example: how can we enable Lua syntax highlighting using Tree-sitter in Neovim?
First, it’s important to note that the support for Tree-sitter in our favorite editor is still experimental. Future changes might crash your carefully crafted configuration, for example. But it works well enough to be able to use it today.
We could directly use the plugin nvim-treesitter to use Tree-sitter with Neovim. But I think it’s useful to try to configure the syntax highlighting for Lua without any plugin at the beginning, for two reasons:
- It helps understand how Tree-sitter works with Neovim; it’s useful if you run into some problems trying to configure it.
- Some users (including me) don’t necessarily want to install 938792387 plugins, especially Neovim ones. In my experience, they have tendency to change often and break my configuration, especially when Neovim itself gets a version bump. It’s even more true when we deal with experimental features.
By default, Vim and Neovim use regexes for syntax highlighting. It can be set on and off with the Ex commands :syntax on
and :syntax off
respectively. Now, if you want to use Tree-sitter, I’d recommend turning off the syntax, to be sure that the highlighting indeed use Tree-sitter. It’s normally set off by default when enablin syntax highlighting with Tree-sitter, but setting it off early can show you if some of your plugins require this regex-based syntax to be on.
To enable our syntax highlighting for Lua, we first need a Lua parser. It needs to be a shared library (with the extension so
) for Neovim to be able to use it.
It’s where we bump into our first problem: as we saw above, the Tree-sitter CLI can generate parsers from a grammar file for a specific programming language; but this parser is a simple C file, not a shared library. You can find many of these parsers direclty from the official Tree-sitter documentation, so we don’t need to generate them manually. But you won’t find easily shared libraries for these parsers.
For example, here’s a Github repo with a Lua parser. The parser is called parser.c
. To generate a shared library, you’ll need to download both files parser.c
and scanner.c
, and then use a C compiler.
Using gcc on Linux, you can run the following command:
gcc -o lua.so -shared parser.c scanner.c -Os -fPIC
Why do we need this file scanner.c
now? It depends on the parser; some will need one of these scanner files, others won’t. As a general rule of thumb, if you see a file scanner.c
(or scanner.cc
) in the directory src
of a parser’s source code, always use it to compile your shared library.
You can also look at this file from the plugin nvim-treesitter to see what files are used to compile some of these parsers.
You won’t necessarily need the option -fPIC
; again, it depends on the parser. Don’t worry about that: if you need it, your compiler will throw an error message and abort the compilation anyway.
The command above will output the wonderful file lua.so
. The name of the parser should be the name of the programming language it parses.
You’ll then need to copy this file into one of Neovim’s runtimepath, in a new “parser” subfolder. For example, on my Linux-based system, I’d need to copy lua.so
into ~/.config/nvim/parser/lua.so
.
Now that we have our parser in the form of a shared library, we need some queries to capture the nodes in order to highlight them. The plugin nvim-treesitter can help us again: it includes many queries for many programming languages, including some queries for Lua.
We need to get the file highlights.scm
and put it in Neovim’s runtimepath again, under the directory queries/lua
. For example, on my system, I’d need to download the query file in ~/.config/nvim/queries/lua/highlights.scm
.
Finally, we can try to open a Lua file in our buffer and enable Tree-sitter highlighting with the following Ex command:
:lua vim.Tree-sitter.start()
The function start()
can have two arguments:
- The buffer number
- The programming language we want to parse.
If you don’t specify the buffer, it will try to parse the current one by default (the buffer 0). If you don’t specify the second argument, it will take the filetype of the buffer and try to find a parser with this name, in our case lua.so
.
The following Ex commands is equivalent to the one above:
:lua vim.Tree-sitter.start(0, "lua")
This command will also turn off the default regex-based syntax highlighting. It’s like running the Ex command :syntax off
.
That’s it! You have now some beautiful syntax highlighting powered by Tree-sitter.
You can find all the files necessary to create the shared library, as well as the shared library and the query file, in this companion repository.
That’s great, but how can we customize the color scheme of the syntax highlighting?
:help treesitter
:help treesitter-parsers
:help treesitter-query
Tree-sitter and Color Schemes
Good news, everyone! Vim’s color schemes are compatible with Tree-sitter. If you want to keep the same color scheme as before, you don’t have to do anything; it should mostly work.
You can also use specific highlight groups for Tree-sitter, allowing you to customize the color scheme. Before Neovim 0.8, these highlight groups were all prefixed with “TS”; for example “TSBoolean”. But for Neovim 0.8 and up, these highlight groups are not relevant anymore. I assume here that you’re using Neovim 0.8 or greater; if not, you can find these deprecated highlight groups here.
With Neovim 0.8 and higher, we can use the capture of our Tree-sitter queries as highlight group. For example, if you open the file highlight.scm, you’ll find queries like the following:
(repeat_statement
[
"repeat"
"until"
] @repeat)
(if_statement
[
"if"
"elseif"
"else"
"then"
"end"
] @conditional)
You can then use the capture @repeat
and @conditional
in a color scheme file. For example, using Vimscript:
hi @conditional ctermfg=red ctermbg=NONE cterm=NONE
hi @repeat ctermfg=blue ctermbg=NONE cterm=NONE
You can write these two lines in a file mycolorscheme.vim
, put it in your runtimepath in the subfolder “colors” (for example ~/.config/nvim/colors/mycolorscheme.vim
), and load it with the Ex command :colors mycolorscheme
.
You can do the same in Lua; look at :help nvim_set_hl
.
You can even modify the queries if you want to customize even more your color scheme, and create your own beautiful little world. If you want a more complete example, you can look at my own color scheme. It’s far from perfect, but at least it’s quite simple.
Finally, you can enable automatically Tree-sitter highlighting for specific filetypes using the runtimepath ftplugin
. For example, I can add the following line in my file ~/.config/nvim/ftplugin/lua.vim
to automatically enable Tree-sitter highlighting for Lua buffers:
lua vim.Tree-sitter.start()
If you want to know more about the runtimepath and its power, I’ve written an article about that here.
:help treesitter-highlight
:help 'runtimepath'
:help ftplugin
:help nvim_set_hl
Neovim Plugin
To enable manually syntax highlighting for one specific programming language, we had to find a parser, compile it, find the queries, and move all these files in the good directories. Looking at this process, a word comes to my mind: cumbersome.
Thankfully, the plugin nvim-treesitter can automate this process for you.
The nvim-treesitter Plugin
We already spoke about it above: nvim-treesitter is the plugin everybody seems to use to easily benefit from Tree-sitter. It can compile parsers using one Ex command, provide many queries for many different functionalities (called “modules”) and programming languages.
At the time of the writing, these modules are:
- Highlighting
- Incremental selection
- Indentation
- Folding
They are not necessarily available for all programming languages, however.
I won’t explain how to configure this plugin in this article, the README is already doing a great job in that regard. But I can give you a brief summary of the most useful commands:
Ex command | Description |
---|---|
TSInstall <language> | Compile a parser for the language <language> , and put it in the “parser” directory of the plugin. |
TSUpdate <language> | Update the parser for the language <language> , or update all of them if <language> is not specified. |
TSUninstall <language> | Uninstall a previously installed parser. |
TSInstallInfo | Display all the parsers available, with indicators for the ones you’ve already installed. |
TSModuleInfo | Display all the modules and their availability per language. |
TSEnable <module> | Enable a <module> for the current session. |
TSDisable <module> | Disable a <module> for the current session. |
There are more Ex commands available; I invite you to read the plugin’s README to know more about them.
For example, on my system, when I run TSInstall go
, it will compile a go parser in ~/.config/nvim/plugged/nvim-treesitter/parser
(I’m using vim-plug as plugin manager, that’s why I’ve this directory plugged
).
Personally, I use this plugin only to compile the parsers I need, and I copy them into my own runtimepath (that is, in my case, from ~/.config/nvim/plugged/nvim-treesitter/parser
to ~/.config/nvim/parser
). I’m only interested in experimenting with Tree-sitter syntax highlighting for now, so it’s enough for me. That said, if you want to use other features, you’ll have to configure the plugin according to your own needs.
:help nvim-treesitter
:help nvim-treesitter-modules
The playground Plugin
There’s another plugin which can help you understand how Tree-sitter works. It allows you to display the CST, the nodes captured by queries, and more. Please welcome nvim-treesitter playground. You’ll need to have nvim-treesitter installed to use it.
As you might have guessed, it’s a great tool if you want to modify your queries to customize your syntax highlighting, or any other module offered by nvim-treesitter for that matter.
Since neovim 0.9 and above, you don’t need this plugin to look at the CST, however. One of these native Ex commands can do that for you:
Ex command | Description |
---|---|
:Inspect | Inspect the token under the cursor and display useful information. |
:InspectTree | Open a new window and display the CST for the current buffer. |
This is already really useful.
Coming back to our plugin nvim-treesitter playground, here are the Ex commands I find the most useful:
Ex command | Description |
---|---|
:TSPlaygroundToggle | Open a new window and display the CST for the current buffer. |
:TSHighlightCapturesUnderCursor | Show the syntax highlighting for the token under the cursor. |
:TSNodeUnderCursor | Show the Tree-sitter node for the token under the cursor. |
You can also use a couple of NORMAL mode keystrokes directly in the playground (the buffer created when you run :TSPlaygroundToggle
). Here are the most interesting ones:
Keystroke | Description |
---|---|
ENTER | Move the cursor to the token represented by the node under the cursor. |
i | Toggle the highlight groups of each node. |
I | Toggle the language of each node. |
R | Reload the playground. |
There are other tools offered by the playground plugin to simplify the creation of queries: a query editor, an omni-completion for Tree-sitter query files, a linter…
If you want me to dive more into the details of Tree-sitter’s queries, don’t hesitate to let me know via the newsletter; you can also let a comment.
Are We at the Top of Tree-sitter?
We’ve seen, in this article, how Tree-sitter works with Neovim, especially to highlight our source code. For other functionalities, Neovim only offer a thin interface, forcing the user to implement these functionalities themselves if they don’t want to use an external plugin.
Let’s summarize what we’ve seen in this article:
- Tree-sitter can parse source code faster and more accurately than the usual regex-based parsing offered by many editors out there.
- A parser for a specific language can be generated from a grammar file thanks to the Tree-sitter CLI.
- A parser can parse source code and output a parse tree (or CST, for Concrete Syntax Tree).
- A parse tree can be queried thanks to special Tree-sitter queries, to capture a set of nodes, allowing them to be highlighted or folded, for example.
- It’s quite cumbersome to install parsers manually. The plugin nvim-treesitter can help you to automate the process.
After Davina finishes explaining the why and the how of Tree-sitter, you look again out of the window: you don’t see a tree anymore, just a bunch of nodes captured by wild queries. You’ve unraveled the curtain of reality to admire the essence of existence, you beautiful freak.