Liu & Lapata (2019)#
Publication#
Relevant Repositories#
https://github.com/nlpyang/PreSumm
Available Models#
The original GitHub repository provides 4 pretrained models:
-
Description: Their baseline abstractive model trained on the CNN/DailyMail dataset
Name:
liu2019-transformerabs
Usage:
from repro.models.liu2019 import TransformerAbs model = TransformerAbs() summary = model.predict("document")
-
Description: A BERT-based extractive model trained on the CNN/DailyMail dataset
Name:
liu2019-bertsumext
Usage:
from repro.models.liu2019 import BertSumExt model = BertSumExt() summary = model.predict("document")
-
Description: A BERT-based abstractive model trained on the CNN/DailyMail dataset
Name:
liu2019-bertsumextabs
Usage:
from repro.models.liu2019 import BertSumExtAbs model = BertSumExtAbs() # or BertSumExtAbs("bertsumextabs_cnndm.pt") summary = model.predict("document")
-
Description: A BERT-based abstractive model trained on the XSum dataset
Name:
liu2019-bertsumextabs
Usage:
from repro.models.liu2019 import BertSumExtAbs model = BertSumExtAbs("bertsumextabs_xsum.pt") summary = model.predict("document")
Implementation Notes#
The input to the pretrained models is expected to be already preprocessed. Therefore, we tried to replicate their preprocessing steps as closely as we could, which means all of the input documents are tokenized and sentence split using the Stanford CoreNLP library within the docker container.
If you pass in a pre-sentence tokenized document, the current implementation does not respect those sentence boundaries and will reprocess the document.
Dockerfile Information#
Image name:
liu2019
Build command:
repro setup liu2019 \ [--not-transformerabs-cnndm] \ [--not-bertsumext-cnndm] \ [--not-bertsumextabs-cnndm] \ [--not-bertsumextabs-xsum] \ [--silent]
Each of the flags indicates whether the corresponding model should be not downloaded (all are by default).
Requires network: No
Testing#
repro setup liu2019
pytest -s models/liu2019/tests
Status#
[x] Regression unit tests pass
See the latest successful tests on Github here[ ] Correctness unit tests pass
The authors provide their model outputs and instructions for processing the data from scratch. We did not attempt to perfectly reproduce their summaries.[x] Model runs on full test dataset
See here[ ] Predictions approximately replicate results reported in the paper
The results for the abstractive models approximately replicate the reported in the paper, but the extractive model does not. See this experiment for details. Calculating the ROUGE scores against the original references compared to the references which were preprocessed in the same way as in training did not seem to make a significant difference.TransformerAbs
on CNN/DailyMailR1
R2
RL
Reported
40.21
17.76
37.09
Ours
40.32
17.73
37.18
BertSumExt
on CNN/DailyMailR1
R2
RL
Reported
43.23
20.24
39.63
Ours
41.88
18.89
38.17
BertSumExtAbs
on CNN/DailyMailR1
R2
RL
Reported
42.13
19.60
39.18
Ours
42.02
19.34
39.01
BertSumExtAbs
on XSumR1
R2
RL
Reported
38.81
16.50
31.27
Ours
38.87
16.40
31.30
The abstractive models seem to be faithful reproductions of the original results, whereas the extractive model is not. It is not clear why.
[ ] Predictions exactly replicate results reported in the paper
See above