Zhang & Bansal (2021)#

Publication#

Finding a Balanced Degree of Automation for Summary Evaluation

Repositories#

https://github.com/ZhangShiyue/Lite2-3Pyramid

Available Models#

Lite3Pyramid

Description: An automated Pyramid Score based on SRL
Name: zhang2021-lite3pyramid

Usage:

from repro.models.zhang2021 import Lite3Pyramid
model = Lite3Pyramid()
inputs = [
    {"candidate": "The candidate summary", "references": ["The references"]}
]
macro, micro = model.predict(inputs)

inputs = [
    {"candidate": "The candidate summary", "units_list": [["STU 1 for reference 1", "STU 2"]]}
]
macro, micro = model.predict(inputs)

macro is the Lite3Pyramid scores averaged over the inputs. micro is the per-input scores, each averaged over the references per input.

Implementation Notes#

Docker Information#

Image name: danieldeutsch/zhang2021:1.2
Docker Hub: https://hub.docker.com/repository/docker/danieldeutsch/zhang2021
Build command:
```
repro setup zhang2021 [--models <model-name>+]
```
The --models argument specifics which pretrained NLI models will be pre-cached inside of the Docker image. See here for the available models.
Requires network: Yes, AllenNLP sends a request for a model even if the model is available locally.

Testing#

repro setup zhang2021
pytest models/zhang2021/tests

Most of the tests require using a GPU for speed purposes.

Status#

[x] Regression unit tests pass
[x] Correctness unit tests pass
The STU extraction gives slightly different results, but calculating the scores given a summary and STUs gives the expected result.
[ ] Model runs on full test dataset
Not tested
[x] Predictions approximately replicate results reported in the paper
[ ] Predictions exactly replicate results reported in the paper

Changelog#

v1.2#

Switched back to the original repo, which has merged our changes from v1.1 and fixed the SRL tagging error.

v1.1#

Changed to our fork of the repo, which adds support for using the GPU for coref and SRL and saving the results to a file