SYNOPSIS

difficult [ -n N ] [ --verbstats statsfile ] [file1 ...]


DESCRIPTION

Given a list of Penn treebank files, extract those trees from it that seem the most difficult, for further use as benchmarking examples for parsing.

The heuristics used to pinpoint the difficult trees are to find trees with either of the following characteristics being the largest:

Tree height.

Number of top-level constituents.

Ambiguity of the top-level verb as per the verbnet.


OPTIONS

h
help
?
Print usage instructions to STDERR.

man
Print the manual page to STDOUT.

verbose
Increase verbosity level.

n N
Extract up to N trees rating the highest per each heuristic used. Defaults to 15.

verbstats statsfile
Source statsfile for the Verbnet stats, as output by verbstat. Defaults to verbnet.stats.pl.


BUGS

Deconjugated verbs (VB) are needed for lookup in Verbnet, however some verb constituents come in conjugated forms (VBP, VBZ, VBN, VBD, VBG).

Verbs deconjugation is presently managed via Lingua::EN::Infinitive, and if the resulting candidate fails to appear in the ambiguity stats lists it can mean that either the deconjugation failed, or that the verb base form is actually absent in the Verbnet database.


SEE ALSO

Lingua::Treebank, verbstat, Lingua::EN::Infinitive


AUTHOR

Vassilii Khachaturov <vassilii@tarunz.org>


LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html