PEXT is mostly for magic bitboards, right? So it’s more of a one time cost. But still, I’d be amazed if it weren’t much slower in the browser now and forever.
No. Magic Bitboards are what you use if you don't have PEXT available. EDIT: I expect the WASM version to use magic-bitboards (which is just a "magical" multiply instruction + a lookup).
Every single sliding piece (ie: Rooks, Bishops, and Queens) use PEXT (or magic-bitboards if the PEXT code is unavailable) to determine where to go. Its a fundamental calculation to determine very, very quickly which moves are legal or not.
Both PEXT and Magic Bitboards are incredible techniques for solving the sliding-piece question. However, PEXT is much much much faster, but relies upon an obscure x86-only assembly instruction (that is slower on AMD Zen or Zen2 unfortunately. So you still need to know your hardware details, Intel machines should use the PEXT version).
---------
Still, even AMD users want the BMI2 version, which has a bunch of bit-operations that can be optimized that are used all the time. (And besides, AMD Zen3 is actually faster on PEXT now, so PEXT is looking good for the future)
-----
In either case, I have my doubts that the NNUE neural-net runs anywhere near as good on WASM than on the hand-optimized SIMD / AVX kernels that the Stockfish team wrote.
Yeah, that’s all I meant, PEXT does the same job as magic bitboards where it’s available. But my point was that either way it’s run once (I seem to remember it’s even a constexpr in recent Stockfishes) so I’d be surprised if it were a major performance hit.
Both Magic Bitboards and PEXT are run every single time Stockfish thinks of a Queen, Bishop, or Rook. Literally every, single, time.
Stockfish does something like 10-million positions per second or something. That's a lot of times the "where can the Queen move" analysis is run. Speeding that routine up by 50% or something (PEXT vs Magic Bitboards) really does make a difference in the great scheme of speed.
----------
Unless you're playing some fantasy-version of Chess (or maybe some extreme endgame where all pawns, queens, bishops, and rooks are dead... pawns because they might promote into a queen/bishop/rook), you'll be benefiting from selecting that PEXT version if you're on an Intel or Zen3 processor.
Is the performance difference of PEXT vs. the classic 64-bit magic bitboards actually close to 50%? The very slight latency increase of and/multiply/shift instead of pext should be somewhat offset by somewhat smaller tables (since magic hashing can have helpful collisions), right?
I should probably know this, since I "discovered" PEXT bitboards, but I got out of computer chess before I got BMI-capable hardware, and so I never actually implemented them :)
Variable-shift perfect hashing (fancy magic) is not especially slower than using pext/pdep, and in any case, there is little value in making move generation more performant. A decent move generator using perfect hashing can run at 40 mnode/s, but Stockfish runs at about 1 mnode/s with a single thread (because of the other work it does). A quick application of Amdahl's law shows that the performance gain from speeding up move generation by any amount is negligible. In fact, with a transposition table and staged move generation, for many nodes only a fraction of possible moves are generated.
Gotcha sorry, been a long time since I did this in chess and the last engine I wrote was for Hnefatafl where you’re using slightly bigger boards and I don’t remember there being a PEXT equivalent for bigger SSE stuff. Somehow convinced myself the lookup table generation was the only hard bit.
Yeah, PEXT only works on 64-bit numbers (and therefore, the 8x8 chess board / 64-bit "bitboard", with 1-bit per position).
EDIT: Both PEXT and Magic Bitboards have a "Setup" phase that is run once. But there's also a separate phase that is run on every single position in the actual chess-board part.