difficult [ -n N ] [ --verbstats statsfile ] [file1 ...]
Given a list of Penn treebank files, extract those trees from it that seem the most difficult, for further use as benchmarking examples for parsing.
The heuristics used to pinpoint the difficult trees are to find trees with either of the following characteristics being the largest:
Deconjugated verbs (VB) are needed for lookup in Verbnet, however some verb constituents come in conjugated forms (VBP, VBZ, VBN, VBD, VBG).
Verbs deconjugation is presently managed via Lingua::EN::Infinitive, and if the resulting candidate fails to appear in the ambiguity stats lists it can mean that either the deconjugation failed, or that the verb base form is actually absent in the Verbnet database.
Lingua::Treebank, verbstat, Lingua::EN::Infinitive
Vassilii Khachaturov <vassilii@tarunz.org>
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.