Zhang & Bansal (2021)#

Publication#

Finding a Balanced Degree of Automation for Summary Evaluation

Repositories#

https://github.com/ZhangShiyue/Lite2-3Pyramid

Available Models#

  • Lite3Pyramid

    • Description: An automated Pyramid Score based on SRL

    • Name: zhang2021-lite3pyramid

    • Usage:

      from repro.models.zhang2021 import Lite3Pyramid
      model = Lite3Pyramid()
      inputs = [
          {"candidate": "The candidate summary", "references": ["The references"]}
      ]
      macro, micro = model.predict(inputs)
      
      inputs = [
          {"candidate": "The candidate summary", "units_list": [["STU 1 for reference 1", "STU 2"]]}
      ]
      macro, micro = model.predict(inputs)
      

      macro is the Lite3Pyramid scores averaged over the inputs. micro is the per-input scores, each averaged over the references per input.

Implementation Notes#

Docker Information#

  • Image name: danieldeutsch/zhang2021:1.2

  • Docker Hub: https://hub.docker.com/repository/docker/danieldeutsch/zhang2021

  • Build command:

    repro setup zhang2021 [--models <model-name>+]
    

    The --models argument specifics which pretrained NLI models will be pre-cached inside of the Docker image. See here for the available models.

  • Requires network: Yes, AllenNLP sends a request for a model even if the model is available locally.

Testing#

repro setup zhang2021
pytest models/zhang2021/tests

Most of the tests require using a GPU for speed purposes.

Status#

  • [x] Regression unit tests pass

  • [x] Correctness unit tests pass
    The STU extraction gives slightly different results, but calculating the scores given a summary and STUs gives the expected result.

  • [ ] Model runs on full test dataset
    Not tested

  • [x] Predictions approximately replicate results reported in the paper

  • [ ] Predictions exactly replicate results reported in the paper

Changelog#

v1.2#

  • Switched back to the original repo, which has merged our changes from v1.1 and fixed the SRL tagging error.

v1.1#

  • Changed to our fork of the repo, which adds support for using the GPU for coref and SRL and saving the results to a file