Home
Teaching
Publications
Software
Data
» ICL home page
Éva Mújdricza-Maydt
OpenSubtitles corpus
315 German-English movie text pairs from the
OpenSubtitle2011
corpus, which was used to train and test the CRF-based sentence aligner
CRFalign
.
training set
: 309 file pairs (7.2 MB)
test set
: 6 file pairs, with golden annotation (160 KB)
training and test set
: 315 file pairs (7.4 MB)
readme