Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased#
v0.1.6 - 2022-07-31#
Added#
Added the
ParallelModel
class as an easy abstraction over thejoblib
library for parallel computation.Added an
aggregate_parallel_metrics
function to make using metrics in parallel easier.Added MTEQE
Changed#
Split
Prism
into reference-basedPrism
and reference-freePrismSrc
. They now support multi-reference and multi-source via averaging over the references/sources.Relaxed the dependency on
pytest
so it does not require a specific version
v0.1.5 - 2022-03-21#
Added#
Added ability to set
beam_size
andnbest
parameters for BART.Added GPU support for MoverScore
Changed#
Changed the backend implementation of MoverScore to use a non-IDF dict based version.
Changed the default BLEURT version to use
"BLEURT-20"
instead of"bleurt-base-128"
and using length-batched optimization.
v0.1.4 - 2022-01-29#
Changed#
Relaxed the
datasets
version requirement to match the GEM Metrics libraryMoved some dependencies into
dev-requirements.txt
Fixed#
Removed warnings that may happen if the Docker clients are not closed.
v0.1.3 - 2022-01-22#
Added#
Added CLIPScore
Added a QA SRL Parser
Added SUPERT
Added BLANC
Added METEOR
Added a role question generator from Pyatkin et al. (2021)
Added using Prism as an MT model
Added COMET
Fixed#
Fixed an error in Lite3Pyramid by updating to a newer version of the code.
v0.1.2 - 2021-10-07#
Changed#
Changed backend of Lite3Pyramid to use our own fork of the official repo with some modifications.
v0.1.1 - 2021-10-05#
Added#
Added Benepar
Added Lite3Pyramid
Added BARTScore
Changed#
Fixed silly variable name typo:
DOCKERHUB_REPRO
toDOCKERHUB_REPO
v0.1.0 - 2021-08-10#
Added#
Added DAE
Adding FactCC and FactCCX
Added utilities to remove empty inputs and insert values at specific indices
Added automatically building and publishing model images
Added a command to pull default Docker images for each model
Added SummaQA
Added NUBIA
Added Prism
Changed#
BERTScore now returns 0 for its metrics if the input is empty.
BLEURT now returns the mean and max scores over the references.
Changing Lewis et al. (2020) to download CNN/DM and XSum models by default
Changing Liu et al. (2019) to download all models by default
v0.0.3 - 2021-08-04#
Added#
Added BLEURT
Added BERTScore
Added BLEU and SentBLEU
Added QuestEval
Added MoverScore
Added FEQA
Changed#
Changed the QAEval interface to match other text generation metrics. The backend was also changed to not rely on SacreROUGE.
v0.0.2 - 2021-07-30#
Added#
Added a
RecipeGenerationModel
classAdded a recipe generation model from Dugan et al. (2020)
Added a
TruecasingModel
classAdded an RNN-based truecaser from Susanto et al. (2016) based on an implementation here.
Added the question-generation and question-answering models used in the QAEval metric. See here.
Added ROUGE
Added
--predict-kwargs
arguments to thepredict
commandAdded support for running and writing evaluation metrics, for instance, ROUGE.
Added a jsonl dataset reader (
JSONLinesDatasetReader
)Added the
SQuADv2Evaluation
metricAdded the BART-based sentence-guided models from Dou et al. (2021).
Added the LERC model from Chen et al. (2020)
Added the QAEval metric
Adding a wrapper around the original Perl implementation of ROUGE. See here
Changed#
Renamed the
--model-args
,--dataset-reader-args
, and--output-write-args
predict
arguments to--model-kwargs
,--dataset-reader-kwargs
, and--output-write-kwargs
.Renamed the
--output-file
argument inpredict
to--output
to allow for output files or directories.
v0.0.1 - 2021-07-22#
Added#
Initial prototype of the library with
setup
andpredict
commands as well as implementations of Gupta et al. (2020), Lewis et al. (2020), and Liu & Lapata (2019).