无法抑制斯坦福解析器警告

时间:2016-03-10 10:00:03

标签: stanford-nlp

在使用Stanford解析器的TokenizerFacotry时,我确保将选项设置为" untokenizable = noneDelete"我还是设法得不到警告,可能是什么问题?

public static List<Tree> findHeadNounPhrases(List<String> unites)
{
    List<Tree> nps = new ArrayList<Tree>();
    for(String sentence : unites)
    {

        HeadFinder hf = new PennTreebankLanguagePack().headFinder();
        StringReader reader = new StringReader(sentence);
         TokenizerFactory<CoreLabel> tokenizerFactory =
                PTBTokenizer.factory(new CoreLabelTokenFactory(), "untokenizable=noneDelete");
        tokenizerFactory.setOptions("untokenizable=noneDelete");
        Tokenizer<CoreLabel> tok =tokenizerFactory.getTokenizer(reader);
        List<CoreLabel> rawWords2 = tok.tokenize();
        Tree tree = lp.apply(rawWords2);
        ...
}

我收到以下警告:

Mar 10, 2016 11:13:51 AM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: ି (U+B3F, decimal: 2879)
Mar 10, 2016 11:13:51 AM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: ି (U+B3F, decimal: 2879)
Mar 10, 2016 11:13:56 AM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable:  (U+89, decimal: 137)

0 个答案:

没有答案