Liu & Lapata (2019)#

Publication#

Text Summarization with Pretrained Encoders

Relevant Repositories#

https://github.com/nlpyang/PreSumm

Available Models#

The original GitHub repository provides 4 pretrained models:

CNN/DM TransformerAbs
- Description: Their baseline abstractive model trained on the CNN/DailyMail dataset
- Name: liu2019-transformerabs
- Usage:
```
from repro.models.liu2019 import TransformerAbs
model = TransformerAbs()
summary = model.predict("document")
```
CNN/DM BertSumExt
- Description: A BERT-based extractive model trained on the CNN/DailyMail dataset
- Name: liu2019-bertsumext
- Usage:
```
from repro.models.liu2019 import BertSumExt
model = BertSumExt()
summary = model.predict("document")
```

CNN/DM BertSumExtAbs

Description: A BERT-based abstractive model trained on the CNN/DailyMail dataset
Name: liu2019-bertsumextabs

Usage:

from repro.models.liu2019 import BertSumExtAbs
model = BertSumExtAbs()  # or BertSumExtAbs("bertsumextabs_cnndm.pt")
summary = model.predict("document")

XSum BertSumExtAbs

Description: A BERT-based abstractive model trained on the XSum dataset
Name: liu2019-bertsumextabs

Usage:

from repro.models.liu2019 import BertSumExtAbs
model = BertSumExtAbs("bertsumextabs_xsum.pt")
summary = model.predict("document")

Implementation Notes#

The input to the pretrained models is expected to be already preprocessed. Therefore, we tried to replicate their preprocessing steps as closely as we could, which means all of the input documents are tokenized and sentence split using the Stanford CoreNLP library within the docker container.
If you pass in a pre-sentence tokenized document, the current implementation does not respect those sentence boundaries and will reprocess the document.

Dockerfile Information#

Image name: liu2019

Build command:

repro setup liu2019 \
    [--not-transformerabs-cnndm] \
    [--not-bertsumext-cnndm] \
    [--not-bertsumextabs-cnndm] \
    [--not-bertsumextabs-xsum] \
    [--silent]

Each of the flags indicates whether the corresponding model should be not downloaded (all are by default).

Requires network: No

Testing#

repro setup liu2019
pytest -s models/liu2019/tests

Status#

[x] Regression unit tests pass
See the latest successful tests on Github here
[ ] Correctness unit tests pass
The authors provide their model outputs and instructions for processing the data from scratch. We did not attempt to perfectly reproduce their summaries.
[x] Model runs on full test dataset
See here
[ ] Predictions approximately replicate results reported in the paper
The results for the abstractive models approximately replicate the reported in the paper, but the extractive model does not. See this experiment for details. Calculating the ROUGE scores against the original references compared to the references which were preprocessed in the same way as in training did not seem to make a significant difference.

TransformerAbs on CNN/DailyMail

R1

R2

RL

Reported

40.21

17.76

37.09

Ours

40.32

17.73

37.18

BertSumExt on CNN/DailyMail

R1

R2

RL

Reported

43.23

20.24

39.63

Ours

41.88

18.89

38.17

BertSumExtAbs on CNN/DailyMail

R1

R2

RL

Reported

42.13

19.60

39.18

Ours

42.02

19.34

39.01

BertSumExtAbs on XSum

R1

R2

RL

Reported

38.81

16.50

31.27

Ours

38.87

16.40

31.30

The abstractive models seem to be faithful reproductions of the original results, whereas the extractive model is not. It is not clear why.
[ ] Predictions exactly replicate results reported in the paper
See above

	R1	R2	RL
Reported	40.21	17.76	37.09
Ours	40.32	17.73	37.18

	R1	R2	RL
Reported	43.23	20.24	39.63
Ours	41.88	18.89	38.17

	R1	R2	RL
Reported	42.13	19.60	39.18
Ours	42.02	19.34	39.01

	R1	R2	RL
Reported	38.81	16.50	31.27
Ours	38.87	16.40	31.30