A production-ready PyTorch implementation of a modern autoregressive Transformer decoder that combines the best ideas from LLaMA and Qwen-3 architectures.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results