Classic Emacs syntax highlighting is based on regular expressions ("font-lock-mo...

amitp · on March 2, 2025

(author here) I agree, the `type` example could be done with regular expressions. In part 2 I'm planning to describe the real reason I was using tree-sitter here. I wanted to highlight certain combinations of operations based on the naming conventions I use in one of my projects. In particular, I want to catch a function call where a function named "x_to_y" has an argument with a name that does not appear to be an "x". However, while writing part 1 I realized that I could probably do that with a regular expression…

kleiba · on March 2, 2025

Sound interesting, looking forward to part 2 then!

neilv · on March 2, 2025

In addition to leaning mostly on regexps (used in a few ways), the ancient Emacs `font-lock` highlighting also uses "syntax classes" of characters to help tokenize/lex and structure (e.g., is this character an identifier constituent, does it start a string literal, does it start a structural grouping like a parentheses, etc.). There's also some ways to insert arbitrary code to do some things that are harder, like non-regexp lookahead. You can also annotate pieces of text as you go through it, to cache information.

The rules for indenting are actually implemented differently, even though they also involve some kind of parse. And it's not unusual to have to cache context information about the current line, for performance, so that you don't have to look back at preceding lines until you're satisfied you have enough context to indent the current line. The functions to indent multiple lines at once of course might represent this context without having to annotate the buffer.

> you have a syntax error: how does then your syntax highlighter cope with that?

I wrote (but didn't release) an all-new language-specific incremental fast parser for Emacs that recovered from some syntax errors. My general approach was to pick a region of text that included the obvious syntax error, visually highlight it in red, annotate it so that a mouseover would hover an explanation bubble of what's wrong with it, and then continue the parse assuming some reasonable context. You can see screenshots at:

https://www.neilvandyke.org/quack/#meow

For example, for an unterminated string literal, it would error-highlight the opening quote and subsequent characters up to the first whitespace. For another example, a string literal with an invalid escape sequence would error-highlight the entire string literal up through the closing quote. Another example shown is detecting a character that can't occur in that context (a close-paren immediately after a comment-the-following-s-expression).

ssivark · on March 2, 2025

Very excited to see parsing for ill-defined states! I like your naming scheme of using animal sounds, but just wanted to bring to your attention that Emacs already has a popular package named meow (for modal editing)

https://github.com/meow-edit/meow

neilv · on March 2, 2025

Thanks for the heads-up on the name collision!

I just updated my page to acknowledge that there's a different project with that name, and I will rename my unreleased project.

(I'd mentioned Meow online several times, years ago, but understandable that they wouldn't have been aware of it, and I have no claim to the name, anyway. Not only was my project never released, but the community where I mostly mentioned it had/has a problem with many posts from our Google Group no longer showing up in Google search hits.)

> I like your naming scheme of using animal sounds,

It originally wasn't. :) The developers of the Scheme implementation family that's now called Racket developed a bespoke IDE for students, called DrScheme (as in doctor), which did some fancy things. For my much less fancy Emacs kludges, I named it "Quack", as in a fake doctor. The animal sounds only came when I needed a name for the successor to Quack.

krupan · on March 2, 2025

Hopefully it copes very poorly so you see the syntax error quickly and fix it :-)

Only half joking