4-bit quantization on newer nvidia hardware is being supported in training as well these days. I believe the gpt-oss models were trained natively in MXFP4, which is a 4-bit floating point / e2m1 (2-exponent, 1 bit mantissa, 1 bit sign).
It doesn't seem terribly common yet though. I think it is challenging to keep it stable.
mxfp4 is a block-based floating point format. The E2M1 format applies to individual values, but each 32-values block also has a shared 8-bit floating point exponent to provide scaling information about the whole block.
It doesn't seem terribly common yet though. I think it is challenging to keep it stable.
[1] https://www.opencompute.org/blog/amd-arm-intel-meta-microsof...
[2] https://www.opencompute.org/documents/ocp-microscaling-forma...