Zhang & Bansal (2021)#
Publication#
Finding a Balanced Degree of Automation for Summary Evaluation
Repositories#
https://github.com/ZhangShiyue/Lite2-3Pyramid
Available Models#
Lite3Pyramid
Description: An automated Pyramid Score based on SRL
Name:
zhang2021-lite3pyramid
Usage:
from repro.models.zhang2021 import Lite3Pyramid model = Lite3Pyramid() inputs = [ {"candidate": "The candidate summary", "references": ["The references"]} ] macro, micro = model.predict(inputs) inputs = [ {"candidate": "The candidate summary", "units_list": [["STU 1 for reference 1", "STU 2"]]} ] macro, micro = model.predict(inputs)
macro
is the Lite3Pyramid scores averaged over the inputs.micro
is the per-input scores, each averaged over the references per input.
Implementation Notes#
Docker Information#
Image name:
danieldeutsch/zhang2021:1.2
Docker Hub: https://hub.docker.com/repository/docker/danieldeutsch/zhang2021
Build command:
repro setup zhang2021 [--models <model-name>+]
The
--models
argument specifics which pretrained NLI models will be pre-cached inside of the Docker image. See here for the available models.Requires network: Yes, AllenNLP sends a request for a model even if the model is available locally.
Testing#
repro setup zhang2021
pytest models/zhang2021/tests
Most of the tests require using a GPU for speed purposes.
Status#
[x] Regression unit tests pass
[x] Correctness unit tests pass
The STU extraction gives slightly different results, but calculating the scores given a summary and STUs gives the expected result.[ ] Model runs on full test dataset
Not tested[x] Predictions approximately replicate results reported in the paper
[ ] Predictions exactly replicate results reported in the paper
Changelog#
v1.2#
Switched back to the original repo, which has merged our changes from v1.1 and fixed the SRL tagging error.
v1.1#
Changed to our fork of the repo, which adds support for using the GPU for coref and SRL and saving the results to a file