Goyal & Durrett (2020)#

Publication#

Evaluating Factuality in Generation with Dependency-level Entailment

Repositories#

https://github.com/tagoyal/dae-factuality

Available Models#

This implementation wraps the DAE evaluation metric. There are three versions available, dae_basic, dae_w_syn and dae_w_syn_hallu, which can be configured using the model parameter to the constructor.

DAE

Description: A dependency-based factuality metric
Name: goyal2020-dae

Usage:

from repro.models.goyal2020 import DAE
# "dae_w_syn" is the default model
model = DAE()
inputs = [
    {"candidate": "The candidate sentence", "sources": ["The source sentence"]}
]
maco, micro = model.predict_batch(inputs)

macro is the average DAE score over the inputs, and micro is the individual DAE scores per input.

Implementation Notes#

The implementation only allows for a single source, so the length of "sources" must be 1.

Docker Information#

Image name: goyal2020
Build command:
```
repro setup goyal2020 [--silent]
```
Requires network: Yes, the CoreNLP server uses the network.

Testing#

repro setup goyal2020
pytest models/goyal2020/tests

Status#

[x] Regression unit tests pass
See here. The regression tests for “dae_basic” and “dae_w_syn” are not very strong since the scores are all around 0.9999.
[ ] Correctness unit tests pass
No example outputs provided by the original repo.
[X] Model runs on full test dataset
See here
[x] Predictions approximately replicate results reported in the paper
[x] Predictions exactly replicate results reported in the paper
See here