SummBank 1.0 contains much of the data collected at the workshop. This includes 40 news clusters in English and Chinese (with 10 documents in each language per cluster), 360 multi-document, human-written summaries, nearly 2 million single-document and multi-document extracts created by automatic and manual methods, and roughly 5500 retrievals. To get full use of this data, it will help to have the Hong Kong Newspaper Corpus (LDC corpus number LDC2000T46) available as well.
data\ | |
automatic\ | |
docjudges\ | all docjudges |
features\ | sentence features computed by the original version of MEAD |
summaries\ | automatic summaries, created by MEAD, CYL, LEXCHAIN, etc. |
clusters\ | |
alignments\ | sentence by sentence English->Chinese alignment information |
chinese\ | Chinese clusters |
english\ | English clusters |
single-doc-cluster-files\ | cluster files for all single documents |
manual\ | |
manual_extracts\ | extracts based on human judgements of sentence relevance |
manual_summaries\ | multi-document summaries written by humans |
tools\ | |
MEAD 3.07 | |
dtder.pl | script for setting dtds locally |
documentation\ | |
jhufinalreport | Final report from the workshop |
SummBank | documentation for this data |
dtd\ | all dtds |