Kryściński et al. (2019)#

Publication#

Evaluating the Factual Consistency of Abstractive Text Summarization

Repositories#

https://github.com/salesforce/factCC

Available Models#

This implementation wraps the FactCC and FactCCX models. Both models will return a score and a label. The score is the probability of the returned label (for binary classification) with label 1 meaning “incorrect”.

FactCC:

Description: A model to score the factual consistency of text
Name: kryscinski2019-factcc

Usage:

from repro.models.kryscinski2019 import FactCC
model = FactCC()
inputs = [
    {"candidate": "The candidate text", "sources": ["The source text"]}
]
macro, micro = model.predict_batch(inputs)

macro contains the scores averaged over the inputs, whereas micro contains the scores for each input.

FactCCX:

Description: A model to score the factual consistency of text
Name: kryscinski2019-factccx

Usage:

from repro.models.kryscinski2019 import FactCCX
model = FactCCX()
inputs = [
    {"candidate": "The candidate text", "sources": ["The source text"]}
]
macro, micro = model.predict_batch(inputs)

macro contains the scores averaged over the inputs, whereas micro contains the scores for each input.

Implementation Notes#

We modified the script to run prediction because it did not save the scores of the model, just the overall labels. The modified script can be found here.

Docker Information#

Image name: kryscinski2019
Build command:
```
repro setup kryscinski2019 [--silent]
```
Requires network: No

Testing#

repro setup kryscinski2019
pytest models/kryscinski2019/tests

Status#

[x] Regression unit tests pass
See here
[ ] Correctness unit tests pass
No examples provided in the original repo
[x] Model runs on full test dataset
See our reproducibility experiment here
[x] Predictions approximately replicate results reported in the paper
See our reproducibility experiment here
[ ] Predictions exactly replicate results reported in the paper
Not tested