Colombo et al., (2021, 2022); Staerman et al., (2022)#
Publication#
This Dockerfile corresponds to three different papers:
Repositories#
All three metrics are implemented in https://github.com/PierreColombo/nlg_eval_via_simi_measures
Available Models#
BaryScore
Name:
colombo2021-baryscore
Usage:
from repro.models.colombo2021 import BaryScore model = BaryScore() inputs = [ {"candidate": "The candidate", "references": ["The first reference", "The second"]} ] macro, micro = model.predict_batch(inputs)
micro
contains the per-input scores andmacro
contains the averaged scores.
InfoLM
Name:
colombo2021-infolm
Usage:
from repro.models.colombo2021 import InfoLM model = InfoLM() inputs = [ {"candidate": "The candidate", "references": ["The first reference", "The second"]} ] macro, micro = model.predict_batch(inputs)
micro
contains the per-input scores andmacro
contains the averaged scores.
DepthScore
Name:
colombo2021-depthscore
Usage:
from repro.models.colombo2021 import DepthScore model = DepthScore() inputs = [ {"candidate": "The candidate", "references": ["The first reference", "The second"]} ] macro, micro = model.predict_batch(inputs)
micro
contains the per-input scores andmacro
contains the averaged scores.
Implementation Notes#
For some reason, the unit tests pass on some machines and not on others. On one of our dev machines, the CPU and GPU tests pass. On another, the CPU pass but the GPU do not. On GitHub Actions, the CPU tests do not pass. Since they are being run in Docker, I assume there is some difference in hardware causing this, but I do not know what the issue is.
Docker Information#
Image name:
danieldeutsch/colombo2021:1.0
Build command: Provide documentation on how to build the image
repro setup colombo2021
Requires network: Yes, it sends a request for resources
Testing#
repro setup colombo2021
pytest models/colombo2021/tests
Status#
[ ] Regression unit tests pass
See the implementation notes; https://github.com/danieldeutsch/repro/runs/5210482796[ ] Correctness unit tests pass
[ ] Model runs on full test dataset
[ ] Predictions approximately replicate results reported in the paper
[ ] Predictions exactly replicate results reported in the paper