SummBank

News

Introduction

SummBank 1.0 contains much of the data collected at the workshop. This includes 40 news clusters in English and Chinese (with 10 documents in each language per cluster), 360 multi-document, human-written summaries, nearly 2 million single-document and multi-document extracts created by automatic and manual methods, and roughly 5500 retrievals. To get full use of this data, it will help to have the Hong Kong Newspaper Corpus (LDC corpus number LDC2000T46) available as well.

Pointers

data\
automatic\
docjudges\	all docjudges
features\	sentence features computed by the original version of MEAD
summaries\	automatic summaries, created by MEAD, CYL, LEXCHAIN, etc.
clusters\
alignments\	sentence by sentence English->Chinese alignment information
chinese\	Chinese clusters
english\	English clusters
single-doc-cluster-files\	cluster files for all single documents
manual\
manual_extracts\	extracts based on human judgements of sentence relevance
manual_summaries\	multi-document summaries written by humans
tools\
MEAD 3.07
dtder.pl	script for setting dtds locally
documentation\
jhufinalreport	Final report from the workshop
SummBank 1.0 documentation	documentation for this data
dtd\	all dtds

SummBank

News

Introduction

Pointers

Contents