Models#
The following papers have models implemented in Repro.
Summarization#
Paper |
Authors |
Docs |
---|---|---|
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer |
||
Yang Liu and Mirella Lapata |
||
GSum: A General Framework for Guided Neural Abstractive Summarization |
Zi-Yi Dou, Pengfei Liu, Hiroaki Hayashi, Zhengbao Jiang, and Graham Neubig |
Text Generation Evaluation#
Paper |
Authors |
Docs |
---|---|---|
Chin-Yew Lin |
||
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary |
Daniel Deutsch, Tania Bedrax-Weiss, and Dan Roth |
|
Thibault Sellam, Dipanjan Das, and Ankur P. Parikh |
||
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi |
||
BLEU: A Method for Automatic Evaluation of Machine Translation |
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu |
|
Thomas Scialom, Paul-Alexis Dray, Patrick Gallinari, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano, and Alex Wang |
||
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance |
Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, and Steffen Eger |
|
Esin Durmus, He He, and Mona Diab |
||
Evaluating Factuality in Generation with Dependency-level Entailment |
Tanya Goyal and Greg Durrett |
|
Evaluating the Factual Consistency of Abstractive Text Summarization |
Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher |
|
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models |
Thomas Scialom, Sylvain Lamprier, Benjamin Piwowarski, and Jacopo Staiano |
|
NUBIA: NeUral Based Interchangeability Assessor for Text Generation |
Hassan Kane, Muhammed Yusuf Kocyigit, Ali Abdalla, Pelkins Ajanoh, and Mohamed Coulibali |
|
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing |
Brian Thompson and Matt Post |
|
Finding a Balanced Degree of Automation for Summary Evaluation |
Shiyue Zhang and Mohit Bansal |
|
Weizhe Yuan, Graham Neubig, and Pengfei Liu |
||
CLIPScore: A Reference-free Evaluation Metric for Image Captioning |
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi |
|
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization |
Yang Gao, Wei Zhao, and Steffen Eger |
|
Fill in the BLANC: Human-free quality estimation of document summaries |
Oleg Vasilyev, Vedant Dharnidharka, and John Bohannon |
|
Meteor Universal: Language Specific Translation Evaluation for Any Target Language |
Michael Denkowski and Alon Lavie |
|
Ricardo Rei, Craig Stewart, Ana C Farinha, and Alon Lavie |
||
Automatic Text Evaluation through the Lens of Wasserstein Barycenters |
Pierre Colombo, Guillaume Staerman, Chloe Clavel, and Pablo Piantanida |
|
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation |
Pierre Colombo, Chloe Clavel, and Pablo Piantanida |
|
A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions |
Guillaume Staerman, Pavlo Mozharovskyi, Pierre Colombo, Stéphan Clémençon, and Florence d’Alché-Buc |
|
Just Ask! Evaluating Machine Translation by Asking and Answering Questions |
Mateusz Krubinski, Erfan Ghadery, Marie-Francine Moens, Pavel Pecina |
Question Answering#
Paper |
Authors |
Docs |
---|---|---|
Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner |
||
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary |
Daniel Deutsch, Tania Bedrax-Weiss, and Dan Roth |
Question Generation#
Paper |
Authors |
Docs |
---|---|---|
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary |
Daniel Deutsch, Tania Bedrax-Weiss, and Dan Roth |
|
Asking It All: Generating Contextualized Questions for any Semantic Role |
Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, and Ido Dagan |
Parsing#
Paper |
Authors |
Docs |
---|---|---|
Multilingual Constituency Parsing with Self-Attention and Pre-Training |
Nikita Kitaev, Steven Cao, and Dan Klein |
Others#
Paper |
Authors |
Docs |
---|---|---|
RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text |
Liam Dugan, Daphne Ippolito, Arun Kirubarajan, and Chris Callison-Burch |
|
Learning to Capitalize with Character-Level Recurrent Neural Networks: An Empirical Study |
Raymond Hendy Susanto, Hai Leong Chieu, and Wei Lu |
|
MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics |
Anthony Chen, Gabriel Stanovsky, Sameer Singh, and Matt Gardner |
|
Nicholas FitzGerald, Julian Michael, Luheng He, and Luke Zettlemoyer |