Yuan et al. (2021)#

Publication#

BARTScore: Evaluating Generated Text as Text Generation

Repositories#

https://github.com/neulab/BARTScore

Available Models#

BARTScore
- Description: A text generation evaluation metric based on BART
- Name: yuan2021-bartscore
- Usage:
```
from repro.models.yuan2021 import BARTScore
model = BARTScore(model="cnn")
inputs = [
    {"candidate": "The candidate text", "references": ["The references"]}
]
macro, micro = model.predict_batch(inputs)
```
  macro and micro are the average and per-input BARTScores. There are three supported models: "default", "cnn", and "parabank". "default" will use the facebook/bart-large checkpoint. "cnn" will use the facebook/bart-large-cnn checkpoint. "parabank" will use the facebook/bart-large-cnn and load the weights trained on Parabank.

Implementation Notes#

Docker Information#

Image name: danieldeutsch/yuan2021:1.0
Docker Hub:
Build command:
```
repro setup yuan2021 [--silent]
```
Requires network: Yes, there is still a request sent although the models are pre-cached.

Testing#

repro setup yuan2021
pytest models/yuan2021/tests

Status#

[x] Regression unit tests pass
[x] Correctness unit tests pass
We verify the outputs on their Github Readme. See here.
[ ] Model runs on full test dataset
Not tested
[ ] Predictions approximately replicate results reported in the paper
Not tested
[ ] Predictions exactly replicate results reported in the paper
Not tested

Changelog#