XAttention is a plug-and-play sparse attention framework for Transformers that speeds up long-context inference by up to 13.5× â€” without sacrificing accuracy. It introduces a lightweight metric based ...