link

January 6, Tuesday
12:00 – 14:00

Dataless classification
Computer Science seminar
Lecturer : Lev-Arie Ratinov
Affiliation : University of Illinois at Urbana-Champaign
Location : 202/37
Host : Dr. Michael Elkin
Traditionally, text categorization has been studied as the problem of training of a classifier using labeled data. However, people can categorize documents into named categories without any explicit training because we know the meaning of category names. In this paper, we introduce {em Dataless Classification}, a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Like humans, a dataless classifier interprets a string of words as a set of semantic concepts. We propose a model for dataless classification and show that the label name alone is often sufficient to induce classifiers. Using Wikipedia as our source of world knowledge, we get 85.29% accuracy on tasks from the 20 Newsgroup dataset and 88.62% accuracy on tasks from a Yahoo! Answers dataset without {em any labeled or unlabeled}

Short Bio: Lev Ratinov is a Phd candidate in University of Illinois at Urbana-Champaign. He has done work on Machine Learning in Natural Language Processing and Information Extraction and has published a number of papers in several international conferences including "Dataless Classification"(AAAI08), "Learning and Inference with Constraints" (AAAI08), and "Guiding Semi-Supervision with Constraint-Driven Learning" (ACL07