A production-ready PyTorch implementation of a modern autoregressive Transformer decoder that combines the best ideas from LLaMA and Qwen-3 architectures.