
We trained “critique-writing” models to explain flaws in summaries. Human evaluators find flaws in summaries way more often when shown our model’s critiques. Larger models are higher at self-critiquing, with scale improving critique-writing greater than summary-writing. This shows promise for using AI systems to help human supervision of AI systems on difficult tasks.