Recent results show that large language models struggle with compositional tasks, suggesting a hard limit to their abilities.