Kane et al. (2020)#

Publication#

NUBIA: NeUral Based Interchangeability Assessor for Text Generation

Repositories#

https://github.com/wl-research/nubia

Available Models#

Nubia
- Description: A learned text generation evaluation metric
- Name: kane2020-nubia
- Usage: Include a small snippet for how to use the model
```
from repro.models.kane2020 import NUBIA
model = NUBIA()
inputs = [
    {"candidate": "The candidate text", "references": ["The reference text"]}
]
macro, micro = model.predict_batch(inputs)
```
  macro is the Nubia score averaged over the inputs, and micro is the Nubia score per-input.

Implementation Notes#

The implementation does not support using a GPU
The metric only supports a single reference, so the length of references must be 1.

Docker Information#

Image name: danieldeutsch/kane2020:1.0
Build command:
```
repro setup kane2020 [--silent]
```
Requires network: No

Testing#

repro setup kane2020
pytest models/kane2020/tests

Status#

[x] Regression unit tests pass
[x] Correctness unit tests pass
See here. We replicated the features show in an example from the original repository. However, there are additional features now and the overall score has changed.
[ ] Model runs on full test dataset
Not tested
[ ] Predictions approximately replicate results reported in the paper
Not tested
[ ] Predictions exactly replicate results reported in the paper
Not tested

Changelog#