Lichess uses Stockfish 14+ NNUE compiled with WASM, so it should be essentially ...

dragontamer · on Dec 15, 2021

Except Stockfish has some hand-written x86-specific bits, such as PEXT / PDEP, or BMI2 instructions. I'm talking hand-tuned assembly here, really good stuff and very high performance.

Running Stockfish 30% to 40% slower for no reason at all, and with far less RAM / resources means that you'll get weaker analysis at slower speeds.

And those "tuning options" include tablebases, which would provably play perfectly (and Stockfish is smart enough to "think" with tablebases: if it can prove a won position with a tablebase, it won't bother exploring and will just report the mate-in-300 immediately)

stouset · on Dec 15, 2021

> Running Stockfish 30% to 40% slower for no reason at all, and with far less RAM / resources means that you'll get weaker analysis at slower speeds.

Even if 100% true, this is a complete non-issue. Absolutely nobody reading this comment is going to notice a meaningful difference or benefit from running their games against a 3550 Elo engine running hand-optimized x86 instructions versus a 3200 Elo engine running in WASM in the browser.

The extreme convenience of pressing one button to receive high-quality analysis (and ongoing live analysis powered by a WASM engine) beats the annoyance of exporting your games to PGN and importing them into an external UI just for the sake of analyzing your games even farther past the horizon that humans will ever truly be able to understand. If you're a competitive GM or Super GM? Sure, maybe that level of analysis will help you find unexplored lines for your preparation. Fro anyone else? It's completely irrelevant.

dragontamer · on Dec 15, 2021

> The extreme convenience of pressing one button to receive high-quality analysis (and ongoing live analysis powered by a WASM engine) beats the annoyance of exporting your games to PGN and importing them into an external UI just for the sake of analyzing your games even farther past the horizon that humans will ever truly be able to understand. If you're a competitive GM or Super GM? Sure, maybe that level of analysis will help you find unexplored lines for your preparation. Fro anyone else? It's completely irrelevant.

Its one button click for me.

"Download .PGN" button -> open with SCID vs PC (which I setup as the default app for PGNs).

I hit "F2" and bam, analysis is going on already. Sure, you have to set it up so that the F2 engine is Stockfish / tuned parameters as appropriate, but in the long run, having control of this is clearly beneficial. Especially if you're going to be using computer analysis to teach yourself more advanced tactics.

thom · on Dec 15, 2021

PEXT is mostly for magic bitboards, right? So it’s more of a one time cost. But still, I’d be amazed if it weren’t much slower in the browser now and forever.

dragontamer · on Dec 15, 2021

No. Magic Bitboards are what you use if you don't have PEXT available. EDIT: I expect the WASM version to use magic-bitboards (which is just a "magical" multiply instruction + a lookup).

Every single sliding piece (ie: Rooks, Bishops, and Queens) use PEXT (or magic-bitboards if the PEXT code is unavailable) to determine where to go. Its a fundamental calculation to determine very, very quickly which moves are legal or not.

Both PEXT and Magic Bitboards are incredible techniques for solving the sliding-piece question. However, PEXT is much much much faster, but relies upon an obscure x86-only assembly instruction (that is slower on AMD Zen or Zen2 unfortunately. So you still need to know your hardware details, Intel machines should use the PEXT version).

---------

Still, even AMD users want the BMI2 version, which has a bunch of bit-operations that can be optimized that are used all the time. (And besides, AMD Zen3 is actually faster on PEXT now, so PEXT is looking good for the future)

-----

In either case, I have my doubts that the NNUE neural-net runs anywhere near as good on WASM than on the hand-optimized SIMD / AVX kernels that the Stockfish team wrote.

Even if there's some auto-vectorization that WASM can do, its really hard to beat handcrafted assembly / intrinsics (https://github.com/official-stockfish/Stockfish/blob/master/...)

thom · on Dec 15, 2021

Yeah, that’s all I meant, PEXT does the same job as magic bitboards where it’s available. But my point was that either way it’s run once (I seem to remember it’s even a constexpr in recent Stockfishes) so I’d be surprised if it were a major performance hit.

dragontamer · on Dec 15, 2021

> But my point was that either way it’s run once

Both Magic Bitboards and PEXT are run every single time Stockfish thinks of a Queen, Bishop, or Rook. Literally every, single, time.

Stockfish does something like 10-million positions per second or something. That's a lot of times the "where can the Queen move" analysis is run. Speeding that routine up by 50% or something (PEXT vs Magic Bitboards) really does make a difference in the great scheme of speed.

----------

Unless you're playing some fantasy-version of Chess (or maybe some extreme endgame where all pawns, queens, bishops, and rooks are dead... pawns because they might promote into a queen/bishop/rook), you'll be benefiting from selecting that PEXT version if you're on an Intel or Zen3 processor.

zwegner · on Dec 16, 2021

Is the performance difference of PEXT vs. the classic 64-bit magic bitboards actually close to 50%? The very slight latency increase of and/multiply/shift instead of pext should be somewhat offset by somewhat smaller tables (since magic hashing can have helpful collisions), right?

I should probably know this, since I "discovered" PEXT bitboards, but I got out of computer chess before I got BMI-capable hardware, and so I never actually implemented them :)

agalunar · on Dec 16, 2021

Variable-shift perfect hashing (fancy magic) is not especially slower than using pext/pdep, and in any case, there is little value in making move generation more performant. A decent move generator using perfect hashing can run at 40 mnode/s, but Stockfish runs at about 1 mnode/s with a single thread (because of the other work it does). A quick application of Amdahl's law shows that the performance gain from speeding up move generation by any amount is negligible. In fact, with a transposition table and staged move generation, for many nodes only a fraction of possible moves are generated.

thom · on Dec 15, 2021

Gotcha sorry, been a long time since I did this in chess and the last engine I wrote was for Hnefatafl where you’re using slightly bigger boards and I don’t remember there being a PEXT equivalent for bigger SSE stuff. Somehow convinced myself the lookup table generation was the only hard bit.

dragontamer · on Dec 15, 2021

Yeah, PEXT only works on 64-bit numbers (and therefore, the 8x8 chess board / 64-bit "bitboard", with 1-bit per position).

EDIT: Both PEXT and Magic Bitboards have a "Setup" phase that is run once. But there's also a separate phase that is run on every single position in the actual chess-board part.

thom · on Dec 16, 2021

Thanks for the refresher. You can tell I've abandoned these things half way through too many times! :)