Reviews

This page contains the reviews for the short-paper (now tech-report) "Task-specific Word-Clustering for Part-of-Speech Tagging".

Dear Dr. Yoav Goldberg:

We are sorry to inform you that the following submission 
was not selected by the program committee to appear at 
ACL 2012: 

      Task-specific Word-Clustering for Part-of-Speech
           Tagging

The selection process was very competitive. Due to time 
and space limitations, we could only choose a small number 
of the submitted papers to appear on the program.  Nonetheless, 
I still hope you can attend the conference. 

I have enclosed the reviewer comments for your perusal.

If you have any additional questions, please feel free 
to get in touch.

Best Regards,
Chin-Yew and Miles, Program Chairs, ACL2012 
ACL 2012 

============================================================================ 
ACL 2012 Reviews for Submission #2359
============================================================================ 

Title: Task-specific Word-Clustering for Part-of-Speech Tagging

Authors: Yoav Goldberg
============================================================================
                            REVIEWER #1
============================================================================ 


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

                         APPROPRIATENESS: 4
                                 CLARITY: 4
              ORIGINALITY/INNOVATIVENESS: 3
                   SOUNDNESS/CORRECTNESS: 4
                   MEANINGFUL COMPARISON: 2
                               SUBSTANCE: 3
              IMPACT OF IDEAS OR RESULTS: 3
                           REPLICABILITY: 4
         IMPACT OF ACCOMPANYING SOFTWARE: 1
          IMPACT OF ACCOMPANYING DATASET: 1
                          RECOMMENDATION: 3
                     REVIEWER CONFIDENCE: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

The results are interesting but the paper seems like a variation of the
Toutanova et al. (2003, HLT-NAACL) paper, with a task-specific twist. The use
of self-training is interesting: the reliance on a baseline tagger would seem
to give rise to a rich-getting-richer scheme, which seems to work well, perhaps
because of the relatively low degree of POS ambiguity in the tested languages. 

My main criticism of the paper has to do with the Results section. Improvement
over baseline is obvious, but the authors should have compared their results
against competing models such as Toutanova et al.

============================================================================
                            REVIEWER #2
============================================================================ 


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

                         APPROPRIATENESS: 5
                                 CLARITY: 4
              ORIGINALITY/INNOVATIVENESS: 3
                   SOUNDNESS/CORRECTNESS: 4
                   MEANINGFUL COMPARISON: 4
                               SUBSTANCE: 4
              IMPACT OF IDEAS OR RESULTS: 4
                           REPLICABILITY: 5
         IMPACT OF ACCOMPANYING SOFTWARE: 1
          IMPACT OF ACCOMPANYING DATASET: 1
                          RECOMMENDATION: 4
                     REVIEWER CONFIDENCE: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

This paper studies feature induction style semi-supervised POS tagging.
Normally, this type of semi-supervised learning method employ some unsupervised
method to harvest string or word knowledges from large-scale unlabeled data and
use the knowledges to design new features for a NLP task. In this paper, the
impact of task-specific word clustering (knowledge) on POS tagging (task) is
evaluated. Different from previous work on such "semi-supervised" NER and
dependency parsing, word clusters are calculated based on auto-analyzed data.
The basic idea is similar to the following two papers (which are not cited):

Improving Dependency Parsing with Subtrees from Auto-Parsed Data (EMNLP-2009)
Improving Chinese Word Segmentation and POS Tagging with Semi-supervised
Methods Using Large Auto-Analyzed Data (IJCNLP-2012)

Experiments indicate that word clusters can enhance a (very) weak POS tagger.
Future readers may be more interested in whether or not word clusters can
enhance a strong POS tagger. If the answer is yes, this technique is more
useful for NLP applications. Unfortunately, this paper does not provide such
informative experiments.

============================================================================
                            REVIEWER #3
============================================================================ 


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

                         APPROPRIATENESS: 5
                                 CLARITY: 3
              ORIGINALITY/INNOVATIVENESS: 4
                   SOUNDNESS/CORRECTNESS: 5
                   MEANINGFUL COMPARISON: 5
                               SUBSTANCE: 3
              IMPACT OF IDEAS OR RESULTS: 3
                           REPLICABILITY: 5
         IMPACT OF ACCOMPANYING SOFTWARE: 1
          IMPACT OF ACCOMPANYING DATASET: 1
                          RECOMMENDATION: 4
                     REVIEWER CONFIDENCE: 2


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

This paper describes experiments that test whether clustering features, based
on the output of a part-of-speech tagger, can aid in improving tagging
accuracy. A short, but decent overview of related work is given in the
introduction. The technique itself is not particularly exciting, but its
application to the problem at hand is novel enough to be of interest to the ACL
community. 

My biggest issue with this paper is that the author(s) decided to use a rather
weak baseline tagger in their experiment. They argue that "Our primary interest
in this work is not in demonstrating state-of-the-art tagging accuracies on the
WSJ corpus but rather examining the contributions of different cluster features
to the tagger accuracy on diverse corpora.". In this sense, the paper indeed
successfully demonstrates that clustering features can aid tagging. But it
seems strange to "ignore" the existence of state-of-the-art taggers. The
bottleneck for many languages lies not in the unavailability of data-driven
techniques, but in the management of data sources. The paper does not make
clear what advantages the technique proposed in this paper, establish for the
development of accurate taggers in the context of scarceness of linguistic
resources and related issues. 

Minor comments:
- fix "a ... criteria" to "a ... criterion" or "criteria"
- Switch Table 2 and 1 around
- "using the additional ... improve[s] the results"