I've got a (relatively old) snapshot of the English Wikipedia that I'm using for testing. The snapshot is around 200 GiB in 14,000,000 files and compresses down to an 11 GiB DwarFS image.
I guess it's without the pictures then ?
Because if I compare with the zim file format (which is optimized for this use-case) https://kiwix.org/en/what-is-the-size-of-wikipedia/ I read
"As of October 2022, the Full English Wikipedia (ca. 6.5 million articles), with images will use up 91GB of storage space (German and French, the second-largest: 36 GB). (...) If you can do without the images (what we call the nopic version), then you are down to 46 GB."
>As of May 2015, the current version of the English Wikipedia article / template / redirect text was about 51 GB uncompressed in XML format.
Compressed data at the same time was 11.5 GB. And that's data from 9 years ago, and just English Wikipedia.
For comparison, I collect leaked password dumps and they (combined, after deduplication) go into hundreds of GBs too. And that's for just username:password lines, not even text.
It's ever so slightly smaller than a .tar.xz of the same data. The main difference being that you don't have to fully extract it in order to access the data.