Sellam et al. (2020)#
Publication#
Repositories#
https://github.com/google-research/bleurt
Available Models#
The BLEURT
class can be instantiated with the checkpoints provided by the original repository.
See here for the list.
The corresponding model names are "BLEURT-20"
, "BLEURT-20-{D12,D6,D3}"
or "bleurt-{tiny,base,large}-{128,512}"
and should be passed to the constructor of the class.
BLEURT
Description: A learned evaluation metric for natural language generation
Name:
sellam2020-bleurt
Usage:
from repro.models.sellam2020 import BLEURT model = BLEURT(model="BLEURT-20") inputs = [ {"candidate": "The candidate text", "references": ["The reference", "The other reference"]} ] scores = model.predict_batch(inputs)
Implementation Notes#
The original BLEURT code only supports single references. Our implementation return both the mean and the max BLEURT score over the references (they will be equal if there is only 1 reference).
Docker Information#
Image name:
sellam2020
Build command:
repro setup sellam2020 \ [--not-tiny-128] \ [--not-base-128] \ [--not-bleurt-20] \ [--tiny-512] \ [--base-512] \ [--large-128] \ [--large-512] \ [--bleurt-20-d12] \ [--bleurt-20-d6] \ [--bleurt-20-d3] \ [--silent]
The arguments specify which BLEURT models should be downloaded.
BLEURT-20
,bleurt-tiny-128
, andbleurt-base-128
are downloaded by default.Requires network: No
Testing#
Explain how to run the unittests for this model
repro setup sellam2020
pytest models/sellam2020/tests
Status#
[x] Regression unit tests pass
[x] Correctness unit tests pass
The unit tests are based on examples in the official repository. See here.[ ] Model runs on full test dataset
Not tested[ ] Predictions approximately replicate results reported in the paper
Not tested[ ] Predictions exactly replicate results reported in the paper
Not tested
Changelog#
v1.1#
Upgraded to set BLEURT-20 as the default model and use the faster length-batched implementation