Colombo et al., (2021, 2022); Staerman et al., (2022)#

Publication#

This Dockerfile corresponds to three different papers:

Repositories#

All three metrics are implemented in https://github.com/PierreColombo/nlg_eval_via_simi_measures

Available Models#

BaryScore

Name: colombo2021-baryscore

Usage:

from repro.models.colombo2021 import BaryScore
model = BaryScore()
inputs = [
    {"candidate": "The candidate", "references": ["The first reference", "The second"]}
]
macro, micro = model.predict_batch(inputs)

micro contains the per-input scores and macro contains the averaged scores.

InfoLM

Name: colombo2021-infolm

Usage:

from repro.models.colombo2021 import InfoLM
model = InfoLM()
inputs = [
    {"candidate": "The candidate", "references": ["The first reference", "The second"]}
]
macro, micro = model.predict_batch(inputs)

micro contains the per-input scores and macro contains the averaged scores.

DepthScore

Name: colombo2021-depthscore

Usage:

from repro.models.colombo2021 import DepthScore
model = DepthScore()
inputs = [
    {"candidate": "The candidate", "references": ["The first reference", "The second"]}
]
macro, micro = model.predict_batch(inputs)

micro contains the per-input scores and macro contains the averaged scores.

Implementation Notes#

For some reason, the unit tests pass on some machines and not on others. On one of our dev machines, the CPU and GPU tests pass. On another, the CPU pass but the GPU do not. On GitHub Actions, the CPU tests do not pass. Since they are being run in Docker, I assume there is some difference in hardware causing this, but I do not know what the issue is.

Docker Information#

Image name: danieldeutsch/colombo2021:1.0
Build command: Provide documentation on how to build the image
```
repro setup colombo2021
```
Requires network: Yes, it sends a request for resources

Testing#

repro setup colombo2021
pytest models/colombo2021/tests

Status#

[ ] Regression unit tests pass
See the implementation notes; https://github.com/danieldeutsch/repro/runs/5210482796
[ ] Correctness unit tests pass
[ ] Model runs on full test dataset
[ ] Predictions approximately replicate results reported in the paper
[ ] Predictions exactly replicate results reported in the paper

Colombo et al., (2021, 2022); Staerman et al., (2022)#

Publication#

Repositories#

Available Models#

Implementation Notes#

Docker Information#

Testing#

Status#

Changelog#