Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Grep with colours written in Go (github.com/arsham)
127 points by arsham on June 5, 2018 | hide | past | favorite | 82 comments


Do be sure to at least consider supporting no colour! http://no-color.org


I don't understand why this is a standard.

First: It requires that every command line tool add support for this environment variable. It may not be much effort for each project, but it adds up to a ton of developer time. Even if this standard gains traction, some apps will slip through the cracks. So in the success case, many users will still be annoyed that one or two of their tools print colors when they shouldn't.

Second: As the FAQ on that page mentions, people can configure their terminal emulators to squelch color. NO_COLOR only matters for users who want color in some applications and not others. In that case, are certain applications supposed to refuse supporting NO_COLOR? Will all users want the same set of programs to behave that way? Are users supposed to unset NO_COLOR before running those programs?

At that point, it seems like a more complex version of using some aliases that add --color=never (or whatever the appropriate flag is). Or if the user prefers to disable color by default, they could set $TERM to "xterm-old" and alias their favorite commands to add --color=always. Either way, there's no need for another environment variable.

Really though, this seems like it should be a feature of the shell or the terminal emulator. Does the process name match certain patterns? Pass color codes through. No? Strip them. That would solve the problem for all programs past, present, and future. And total development effort would likely be less than that needed to implement NO_COLORS in every command line tool.


I'll speak for my case specifically: a lot of programs I use have varying levels of colour-by-default, but I don't want that. What I want is to have them all off, except the ones that actually add value.

I think it's apparent that this isn't something you're really keen to support in ag in the few times I've seen you talk about it, so feel free to close the PR I have open there rather than leave it hanging without any reply.

https://github.com/ggreer/the_silver_searcher/pull/1207


> a lot of programs I use have varying levels of colour-by-default, but I don't want that. What I want is to have them all off, except the ones that actually add value.

Your use case seems perfectly valid to me, but how would NO_COLOR help you? Wouldn't it suppress all color in all programs when it's set? Assuming everyone gets on board, of course.


Yeah that seems to be what he wants!

He can then selectively enable color on the programs he wants by simply unsettling the NO_COLOR variable before execution of the program.


What if someone needs colors, but not colors that util maker used? What if two or more tools make different use of colors? With terminal aware of that problem you can remap or erase specific colors, but maybe it’s wiser to not create a problem at all, since just making text bold may be enough for practical needs.

Edit: missing not


Sounds like a better standard is being able to define color schemas for CLI utilities. Would be pretty awesome if this were possible, though many utilities vary in what type of output is colored a certain way sadly.

Another workaround is to merely pipe the output to a file in certain cases.


> Another workaround is to merely pipe the output to a file in certain cases.

If that works for a certain tool, it's author really has no excuse to not implement NO_COLOR env as well.


Piping the output does not guarantee the removal of the color codes. This is yet another standard some programs have chosen to implement.

The program checks if the output is a tty, if it is not a tty it will drop color information as it is either a pipe or a file. The escape codes would look ugly in a file and could get in your way with some text processing programs.

Even if the program did drop color when not sending output to a tty it still is a crappy way to remove color as the pipe would need another program as a receiver such as pager.


> Even if the program did drop color when not sending output to a tty it still is a crappy way to remove color as the pipe would need another program as a receiver such as pager.

You don't need a pager. cat works. Compare

    grep --color=auto hacker /usr/share/dict/words
with

    grep --color=auto hacker /usr/share/dict/words | cat
The same applies to GNU ls, which actually changes more than just colors depending on whether it's printing to a tty or not.

Programs that emit colors by default when not sending output to a tty are buggy.

A wild card in this mix is Windows. On UNIX-like systems, detecting a tty is simple. On Windows, it is... not so simple. But the OP's tool, blush, doesn't appear to support Windows at all anyway. (Which is totally cool. I very much understand why you might not.)


Yeah, I did not mean to imply you needed a pager, just something to receive the output, a pager is just one choice.

I guess the program implemented the tty method you could always alias your stuff to something like blush="/bin/blush | cat" if you decided you did not like color.

While a part of me says piping programs to get the result you want is the *nix way. Another part of me feels like it would just be a dirty hack to avoid color.


I don't think it matters so much on Windows, because the colour information is sent out of band. When you call SetConsoleTextAttribute on the STD_OUTPUT_HANDLE, and that handle is a file, the call simply fails.


I am not speaking without experience. If you support Windows, you probably need to support non-console environments like cygwin and msys. Moreover, Windows 10 has opt-in support for ANSI style coloring. At some point, coloring via the console APIs will probably stop being used.


That's the point. If the program is already doing isatty() and thus has code for color/no-color output already, just adding getenv("NO_COLOR") is a no-brainer.


I'm confused, with a tagline like "grep with colors!", wouldn't you just use grep if you don't want colors?

This feels like complaining that Spotify doesn't have a silent mode for people who dislike sound. /shrug


Why wouldn't you just write a command line tool that strips the color escapes and then set up aliases like alias ag='ag | nocolor' or whatever. Seems much easier than trying to convince every developer out there to support your environment variable.


Genuinely curious, why do some developers prefer not having colors for ls, grep, etc.? no-color.org mentions that many users prefer having colors disabled, but didn't list any reasons why.


I don't find that they add anything. I feel like they make it harder for me to do a coherent read of the screen, to suck in all the text & process it. I have my own mental algorithms to pick out relevant things, and having a bunch of glaringly contrasting blocks of color glaring out of the terminal at me just makes it harder to slurp in the screen. I don't want that segmentation. It's awful.

And the color themes for the terminal are godawful. No matter how you spin 16 colors, how solarized or other, everyone is kind of chained more or less to that attrocious 16 color pallet, which is always going to be way way higher contrast or low-fi than something like a vim theme that can pick some complementary colors to work with.

Colors feel like my terminal punching me in the face. No thanks.

Also have you ever logged into ubuntu? Holy shit colors were a TERRIBLE idea.


That makes sense. In a way, it's like they're mental "speed bumps" that disrupt reading the text. I can certainly see why those "bumps" would be aggravating, thanks for the insight!


Colorless is not my cup of tea, but here is one perspective I read a few years ago: http://www.linusakesson.net/programming/syntaxhighlighting/i...

HN discussion: https://news.ycombinator.com/item?id=3717303


I disagree with the author's claim that "A splash of colour may grab the reader's attention, but it will inevitably decrease the legibility of the text."

Funny enough, I've found that adding syntax highlighting to even normal English text makes it more readable. I sometimes turn on a random language's syntax highlighting in e.g. Notepad++ even if the choice of colouring is basically nonsense.

In the article you link, apart from the green on grey, the little example paragraph is actually very nice and readable.

Something like http://www.beelinereader.com/individual would probably make things easier for me, though it costs money.

The other issue is that it is too uniform. I like how syntax highlighting tends to make random patterns which kind of looks like pictures in the code - it makes it much easier for me to navigate, and figure out where I was and where I'm going.


There was a link to a study here on HN some year(s) ago where they alternated the colour of each sentence in a text and found the reading speed to be increased. Probably similar effect as making text columns more narrow instead of using the full width of your screen, its easier to navigate and uts easier to read a whole chunk at once instead of word by word.


"...a splash of colour may grab the reader's attention, but it will inevitably decrease the legibility of the text."

This depends entirely on the content of the text. Stories, fiction, non-fiction, reference material ... information where one human is intending to communicate with another human should limit use of color.

For source code, that humans do read but is intended to be parsed and compiled by a computer, I'm not assembling (in my head) the whimsical trials of a protagonist - I need to see structure; color gives me a quicker overview of the structure of a line, function, class, etc.


My guess is the ansi color escape codes mess with peoples parsing code of the output when scripting.


All sane tools check if output isatty(3) before doing that.


Many don't though. Typically, though, the place to add no-color is even on the same line as this test and is a very cheap add, so anyone looking to add it would do well to consider adding both.


Many who? Did I miss the arrival of new “Move Fast Etc” era into unix cli?


There's a bunch even on that list. I had a look at ffind, the second on the page and found nothing to test if the output was tty or otherwise.

Generally though, I'm not out to point fingers, it's just a fact that not all software is complete and bug-free. We're all people with limited time and we don't have the attention to detail that dealing with the maliciously compliant children that computers are takes.


as @abstractbeliefs said, there are a lot that don't, but personally i hate it when apps do that because it removes my ability to actually maintain the colors through pipe chains. It's much harder to convince an app to maintain color through a pipe from an app that does this than it is to remove color from an app that doesn't pay attention to tty or not.


My terminal has non-default long-battle-tested background and foreground colors, and I barely pay an attention to the entire palette, so coloring works like acid. Some colors are simply incompatible, so tuning them is meaningless. Imagine e.g. any dark blue on medium grey.

What I think would be good (for me at least) is not colors, but sort of markup, like bold or underline, which is available in terminal as seen in man pages and PS1. I usually make them bold and compatible colored.


I turned most colors off many years ago. For me it was because several tools (especially ls) used colors that were very hard to distinguish with my terminal background. I found myself spending too much time trying to figure out what was on my screen.

Another bonus is that I don't feel cheated when I log into systems that don't support colors (*BSD, Solaris, etc). Everything just looks normal to me. Even non-syntax highlighted vi (not vim).


The tools don't exactly choose the color. They choose a color name from a palette. Your terminal's theme defines that palette.

If your terminal has a background very close to one of the colors on that palette, that is a theme problem, not an app problem.


This was a long time ago before terminal emulators had themes and I also did a lot of work on actual terminals (monochrome hardware) and the console (black background with gray text). Even my current emulator doesn't support "themes". I have to set each color register individually in the .Xresources file. On modern hardware this works fine, but years ago the quality of CRTs was such that you couldn't guarantee brightness and color levels across devices.

Do you remember the days of X Modelines? I still have the nightmares.


That's mostly true, but there are several different methods for sending colours via escape codes and the one you discuss is just the ANSI standard. There is also 265 colour palette and even true colours (defined via RGB values). However most tools will use the standard 8 colour palette or the extended 8x3 colour palette (8 colours across 3 different intensities) because they're supported on pretty much every terminal emulator (even on Windows's cmd.exe).

So it is possible for tools to choose colours by an exact RGB value, but typically they wont.

The other thing to remember is some tools can have their colours remapped. Even tools people might not expect to be able to. eg you can configure (via environmental variables IIRC) `ls` to have specific colours for specific inode types. However in my personal opinion it is a lot of faff for very little reward.


What a great idea for a standard, so the tool will check the NO_COLOR environment variable, to see if it should display color.


+1 for NO_COLOR.

For example, yellow text on a dull yellow background? No thanks.

For years I've just wrapped commands on bash to strip the colour control-codes from stdout.

function monochrome() { "$@" | sed -r "s/[[:cntrl:]]\[[0-9]{1,3}m//g" }


| sed -e /\x1b\[[0-9;]+m//

^ I have not run the above program, only proven it to be correct.


Sadly that only removes 1 of the 3 methods of escaping terminal colours. Plus there is also the risk of it removing other SGR (Select Graphic Rendition) escape codes such as underline. That latter bug might be an acceptable casualty though.


It was meant to remove underline, bold, dim, and even blink.

Are there xterm codes for generating terminal color that don't end in m? It handily covers 8/88/256/24bit. What else is needed?


24bit uses the colon character as well. So if you're wanting to remove all SGR escape codes* then it's a pretty minor additional to your regex.

* Personally I quite like the bold and reset SGRs even when I don't want the term colourised. Bold can highlight text without drawing your attention away from the main body of the terminal.


I'm still looking for the use of ":" in xterm codes, but already discovered that this is a thing:

"\033]11;#53186f\007"

So, bare minimum, I'll need to include # in future. you can /generate/ 24 bit color using only ; and digits...

When filtering ANSI I'm usually streaming the result to something that expects pure ASCII, and tuning sed to only accept two escape codes would be a pain in the neck.


ISO 8613-6 uses colons. Admittedly it's a less supported standard.

Sadly handling terminal escape codes is one of those impossibly painful jobs. Heck, often even the terminals don't seem to support the escape codes they have documented or the documentation is so poor that you're basically left with trial and error (or a shit load of hex dumps to trawl through if you're lucky). eg I've wasted hours trying to get Kitty specific escape codes working on that terminal emulator before giving up.

I'm investing a lot of effort at the moment learning escape codes because I'm writing a new $SHELL which is designed to have rich media support even in $TERM's which don't support rich content. But I do still have support to convert the $SHELL back into a black and white console (I even have an option to strip colour from the STDOUT (even when pipelined) of all processes negating the need for your sed command. However that feature hasn't yet been committed back to the master branch.


There is a --no-colour/--no-color argument you can pass.


this seems a far worse solution than pushing for a standard of encouraging apps to support --no-color

I think there are FAR fewer people who want it off everywhere by default than people who want it off occasionally

having all the arguments to your app be handling via dash args EXCEPT the one for color is non-standard, harder to explain in the docs, introduces an inconsistency to how your app is controlled, etc., etc...


Is this speed competitive with tools like the silver searcher (`ag`) or is the focus here on color?


As much as I love `ag`, I feel like ripgrep (https://github.com/BurntSushi/ripgrep) deserves mentioning when it comes to speed. If you haven't tried it, do it sooner rather than later.

Here's an excellent write-up on how it works, benchmarks, etc.: https://blog.burntsushi.net/ripgrep/


ripgrep is soooooo good. I have switched to it and will never look back.


A quick look at the source shows that it appears to be linear and just uses `strings.Contains` or `r.MatchString` on each line, so I don't think it has any of the optimizations that are built into `ag`.


That is correct. The project is at its early stages. I want to see what the community need the most and shape the project towards that goal. On the other hand I tried to avoid optimisations until most of functionalities are implemented.


It's a very nice idea and you should be proud of what you've built, but my personal opinion is that speed is a core feature of `grep`.

A good place to start would be this: why GNU grep is fast[1] - Starting with the Boyer-Moore string search algorithm and reading through the optimizations done in GNU grep.

p.s. there's an implementation of Boyer-Moore hiding in Go's standard library.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...


Thanks mate, I will definitely have a read.


Note that you don't need Boyer Moore for the common case. ripgrep for example will very rarely use Boyer Moore. Its work horse is much simpler and typically faster: https://github.com/rust-lang/regex/blob/master/src/literal/m...

In Go-land, you should be able to replace uses of memchr with IndexByte[1], which should be implemented in Assembly on most platforms.

Of course, for any of this to have a big impact, you'll want to take Mike Haertel's advice on avoiding line breaking and stop using bufio.Scanner. :-)

[1] - https://golang.org/pkg/bytes/#IndexByte


So far I've been only concerned about code's simplicity until I understand what there needs to be done. This is not going to be grep or ripgrep. My intent was to make a tool I needed so I started working on it. I thought someone else might like it, now it is joyful to see people are looking at the project.

There are a couple of places I wish I would have done better. Using bufio.Scanner actually bothers me a lot. Also in the Read() method it reads everything from all readers into a buffer instead of pulling what it needs to check.

Thanks for suggestions :)


I'm okay with it being not as fast because speed is not the goal here, but rather highlighting specific patterns to make it easier to spot for the human eyes, especially when tailing log lines from your development webserver.


> "to make it easier to spot for the human eyes,"

I suppose in that sense it does aim to be fast. Fast for the human to parse.


I've got the linux-4.17 kernel tree around, 61,322 files.

My desktop is running ubuntu-18.04, is an i5-3570, and has a fairly quick intel SSD.

Running "blush -R -i FunctionName ." takes 15.090 seconds and finds two files.

Running "ag -i FunctionName", finds one file, missing one in .clang-format.

Running "ag -i -u FunctionName", finds two files and makes 0.64 seconds.

So somewhere around 20-25x faster.


Thank you for doing the comparison. Would you do the same against the latest version (v0.5.0) please? Thank you.


I’m doubtful ag and rg use a lot of smart optimizations to get their speed.


Ummm...your kidding right?

I know ripgrep has a ton of fantastic optimizations by Burntsushi.

You might wanna check it out...before making such statements.


I could be wrong, but I read valarauca1's comment as "I’m doubtful. ag and rg use a lot of smart optimizations to get their speed."


lol people's reading comprensión is so bad sometimes


In fairness, it's the GPs fault on this occasion for not punctuating his or her post. Decarep just read the GPs post as it was literally written (I had to read it 3 times myself to gauge what I thought the post meant).


contextualization is a component of reading comprehension


The post still made contextual sense when read literally. It just wasn't technically accurate. Hence why it was so easy to misinterpret.

Plus the next time you make sweeping generalisations about the reading comprehension abilities of HN it is probably worth remembering that this is an international community and thus English isn't going to be everyone's first language.


Reading comprehension is difficult when there's missing punctuation. For example, "Let's eat grandma" versus "Let's eat, grandma".


contextualization is a component of reading comprehension


Nice job! Language-specific coloring is really nice!

I'll give it a try. I normally use a different Golang tool called sift as my grep replacement (which I love so far): https://github.com/svent/sift

Sift's goals seem to be mostly performance (it is super fast), but it would be nice to have some of these more sophisticated coloring features in there as well, as they are useful.


Cheers mate!


Nice UI! Some time ago I wrote something similar, because I was missing some features in ripgrep (which is otherwise pretty awesome): https://github.com/dominikschulz/gg


I like it. I started something similar with node (I never aimed for performance) trying to go for high grep compatibility but with added extra colors and js regexp flavour.


GNU grep has support for colors.


Please reread the examples, which are specifying multiple searches and custom colors for each type of match, something that GNU grep can't do.


But... Can it elegantly suppress broken pipe errors?


Handling signals are not implemented yet. I appreciate it if you file an issue when you find any. Thanks.


Really cool! But from the title I initially thought this was a grep tool for finding certain colors in your image data.


Useless use of cat candidate.


This is one of my pet peeves - complaining about technically unnecessary, but fully benign uses of cat.

Yes, 'cat FILENAME | blush "some text"' and 'blush "some text" < FILENAME' do the same thing. But, what if you don't have permission to read the file - the former be re-written as 'sudo cat FILENAME | blush "some text"' - the latter form can't. What if you want to build a pipeline? I think its pretty persuasive that 'cat FILENAME | blush "some text" | sort' reads better than 'blush "some text" < FILENAME | sort' - the former reads from left-to-right, the latter reads from the middle, to the left, and then bounces over to the right. Tastes may very - but, I think its a hard sell that such an opinion is clearly wrong.

So, yes, its unnecessary. And, yes, in a script using cat like that can complicate error handling. But, for interactive use, what advice exactly are you trying to convey?


Sometimes things are lost over text -- my comment was meant to be more whimsical than it actually read. Just the old http://porkmail.org/era/unix/award.html joke.

My actual (light) advice is merely that we're all guilty of inappropriately using IO redirection facilities that punish the performance of our shells. `cat <filename> | grep <expression>` should be replaced by `grep <expression> <filename>`.

No doubt that pipelines are easier to read. The author has a whole section in README demonstrating blush's ability to read STDIN - nothing is lost by using best practices everywhere else. Documentation matters, and it should communicate best practices.


A solution that I personally prefer, because the common form looks weird for me:

< FILENAME blush "some text"


I've found that unnecessary use of cat can sometimes have a noticeable performance impact, and I suspect that on the right system with the right filesizes this could be one of those circumstances. (Obviously it doesn't matter one whit with small files.)


There's a difference between useless and unnecessary, both in definition and how a reader views the statement.


It's clearly there to demonstrate the pipe support...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: