Chen et al. (2020)#

Publication#

MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

Repositories#

https://github.com/anthonywchen/MOCHA

Available Models#

This implementation contains a wrapper around the LERC model trained on all of the constituent datasets plus their evaluation script.

  • LERC

    • Description: The LERC model trained on all datasets

    • Name: chen2020-lerc

    • Usage:

      from repro.models.chen2020 import LERC
      model = LERC()
      score = model.predict("context", "question", "reference", "candidate")
      
  • MOCHA Evaluation

    • Description: The MOCHA evaluation script that calculates the Pearson correlation between the ground-truth and predicted scores.

    • Name: chen2020-eval

    • Usage:

      from repro.models.chen2020 import MOCHAEvaluationMetric
      model = MOCHAEvaluationMetric()
      # `inputs` should have the dataset, source, ground-truth score,
      # and predictions
      inputs = [
          {"dataset": dataset, "source": source, "score": score, "prediction": prediction},
          ...
      ]
      metrics = model.predict_batch(inputs)
      

Implementation Notes#

Docker Information#

  • Image name: chen2020

  • Build command:

    repro setup chen2020 [--silent]
    
  • Requires network: No

Testing#

Explain how to run the unittests for this model

repro setup chen2020
pytest models/chen2020/tests

Status#

  • [x] Regression unit tests pass
    See here

  • [ ] Correctness unit tests pass
    No expected outputs provided in the original repo

  • [x] Model runs on full test dataset
    See here

  • [x] Predictions approximately replicate results reported in the paper
    Yes, see here

  • [ ] Predictions exactly replicate results reported in the paper