Accepted Papers:
Accepted Papers:
The State and Fate of Summarization Datasets
Noam Dahan, Gabriel Stanovsky
NAACL 2025
We review 133 summarization datasets across 100+ languages, introduce a dataset ontology, and identify key bottlenecks shaping the field, including data quality issues, limited diversity, and an overreliance on distant supervision. We further make our findings accessible through an interactive platform.
Paper, Repo

PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation
Eliya Habba*, Noam Dahan*, Gili Lior, Gabriel Stanovsky
*equal contribution
EMNLP 2025, Demo
PromptSuite is a toolkit for multi-prompt evaluation featuring both an API and an interactive interface. This framework aims to address prompt sensitivity, where small variations in a prompt can lead to significant performance differences. It is designed to be flexible, modular, and extensible, making it easily adaptable to a wide range of tasks out of the box.
Paper, Demo Video, Repo

Preprints:
Leveraging Digitized Newspapers to Collect Summarization Data in Low-Resource Languages
Noam Dahan, Omer Kidron, Gabriel Stanovsky
Under Review
High quality summarization data remains scarce in under-represented languages. However, historical newspapers, made available through recent digitization efforts, offer an abundant source of untapped, naturally annotated data. In this work, we present a novel method for collecting naturally occurring summaries via Front-Page Teasers, where editors summarize full length articles.
Paper

