Changelog#

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased#

v0.1.6 - 2022-07-31#

Added#

Added the ParallelModel class as an easy abstraction over the joblib library for parallel computation.
Added an aggregate_parallel_metrics function to make using metrics in parallel easier.
Added MTEQE

Changed#

Split Prism into reference-based Prism and reference-free PrismSrc. They now support multi-reference and multi-source via averaging over the references/sources.
Relaxed the dependency on pytest so it does not require a specific version

v0.1.5 - 2022-03-21#

Added#

Added BaryScore, InfoLM, and DepthScore
Added ability to set beam_size and nbest parameters for BART.
Added GPU support for MoverScore

Changed#

Changed the backend implementation of MoverScore to use a non-IDF dict based version.
Changed the default BLEURT version to use "BLEURT-20" instead of "bleurt-base-128" and using length-batched optimization.

v0.1.4 - 2022-01-29#

Changed#

Relaxed the datasets version requirement to match the GEM Metrics library
Moved some dependencies into dev-requirements.txt

Fixed#

Removed warnings that may happen if the Docker clients are not closed.

v0.1.3 - 2022-01-22#

Added#

Added CLIPScore
Added a QA SRL Parser
Added SUPERT
Added BLANC
Added METEOR
Added a role question generator from Pyatkin et al. (2021)
Added using Prism as an MT model
Added COMET

Fixed#

Fixed an error in Lite3Pyramid by updating to a newer version of the code.

v0.1.2 - 2021-10-07#

Changed#

Changed backend of Lite3Pyramid to use our own fork of the official repo with some modifications.

v0.1.1 - 2021-10-05#

Added#

Added Benepar
Added Lite3Pyramid
Added BARTScore

Changed#

Fixed silly variable name typo: DOCKERHUB_REPRO to DOCKERHUB_REPO

v0.1.0 - 2021-08-10#

Added#

Added DAE
Adding FactCC and FactCCX
Added utilities to remove empty inputs and insert values at specific indices
Added automatically building and publishing model images
Added a command to pull default Docker images for each model
Added SummaQA
Added NUBIA
Added Prism

Changed#

BERTScore now returns 0 for its metrics if the input is empty.
BLEURT now returns the mean and max scores over the references.
Changing Lewis et al. (2020) to download CNN/DM and XSum models by default
Changing Liu et al. (2019) to download all models by default

v0.0.3 - 2021-08-04#

Added#

Added BLEURT
Added BERTScore
Added BLEU and SentBLEU
Added QuestEval
Added MoverScore
Added FEQA

Changed#

Changed the QAEval interface to match other text generation metrics. The backend was also changed to not rely on SacreROUGE.

v0.0.2 - 2021-07-30#

Added#

Added a RecipeGenerationModel class
Added a recipe generation model from Dugan et al. (2020)
Added a TruecasingModel class
Added an RNN-based truecaser from Susanto et al. (2016) based on an implementation here.
Added the question-generation and question-answering models used in the QAEval metric. See here.
Added ROUGE
Added --predict-kwargs arguments to the predict command
Added support for running and writing evaluation metrics, for instance, ROUGE.
Added a jsonl dataset reader (JSONLinesDatasetReader)
Added the SQuADv2Evaluation metric
Added the BART-based sentence-guided models from Dou et al. (2021).
Added the LERC model from Chen et al. (2020)
Added the QAEval metric
Adding a wrapper around the original Perl implementation of ROUGE. See here

Changed#

Renamed the --model-args, --dataset-reader-args, and --output-write-args predict arguments to --model-kwargs, --dataset-reader-kwargs, and --output-write-kwargs.
Renamed the --output-file argument in predict to --output to allow for output files or directories.

v0.0.1 - 2021-07-22#

Added#

Initial prototype of the library with setup and predict commands as well as implementations of Gupta et al. (2020), Lewis et al. (2020), and Liu & Lapata (2019).