link

November 21, Wednesday
12:00 – 14:00

Text to Text Generation: Review of the Field and Open Issues
Bio-Informatics seminar
Lecturer : Dr. Michael Elhadad
Lecturer homepage : http://www.cs.bgu.ac.il/~elhadad/
Affiliation : CS, BGU
Location : 202/37
Host : Student Seminar
Text remains the main medium to convey information and knowledge. Machine-readable text has become plentiful and accessible through search engines. Many applications are concerned with transforming text from one to another. The field that studies such transformation is called text to text generation (T2T). T2T is a sub-field of natural language generation (NLG), and stands in contrast with data to text (D2T) generation (where the input data is any non-textual representation).

In T2T, textual units (sentences, clauses, phrases) are extracted from one context and recombined into a new text. As in NLG in general, one can split the overall task of generation into several steps:


In T2T, "content" is encoded as textual units - generally called "SCU" (Shared Content Unit). The input to a T2T system includes a collection of texts that are related in topic, for example a collection of news reports describing the same event (from different sources or published at different times). Content selection and organization in D2T applications is generally related to knowledge representation and inferencing. In the case of T2T, it is related to Information Extraction, which includes named entity recognition, coreference resolution, entity identification, relation identification and scenario identification.

Realization is the linguistic component of generation, and is organized around the following steps:

In the lecture, I will review sample T2T applications and the corresponding methods. I will focus on the type of knowledge required to perform T2T and how it can be acquired. I will end with a review of open issues and possible topics for further research.