如何使用带有句子的Illinois Chunker作为输入?

时间:2015-03-24 19:43:00

标签: java nlp

我试图在每个句子的基础上使用Illinois Chunker。以某种方式提供的入口点是以下代码段:

public class ChunksAndPOSTags {
    public static void main(String[] args) {
    String filename = null;
    try {
        filename = args[0];
        if (args.length > 1) throw new Exception();
    }
    catch (Exception e) {
        System.err.println("usage: java edu.illinois.cs.cogcomp.lbj.chunk.ChunksAndPOSTags <input file>");
        System.exit(1);
    }

    Chunker chunker = new Chunker();
    Parser parser = new PlainToTokenParser(
        new WordSplitter(new SentenceSplitter(filename)));
        String previous = "";
        for (Word w = (Word) parser.next(); w != null; w = (Word) parser.next()) {
            String prediction = chunker.discreteValue(w);
            if (prediction.startsWith("B-") ||
                prediction.startsWith("I-") &&
                !previous.endsWith(prediction.substring(2)))
                System.out.print("[" + prediction.substring(2) + " ");
            System.out.print("(" + w.partOfSpeech + " " + w.form + ") ");
            if (!prediction.equals("O") &&
                (w.next == null                                 || 
                 chunker.discreteValue(w.next).equals("O")      || 
                 chunker.discreteValue(w.next).startsWith("B-") ||
                 !chunker.discreteValue(w.next).endsWith(prediction.substring(2))))
                System.out.print("] ");
            if (w.next == null)
                System.out.println();
            previous = prediction;
        }
    }
}

我们如何将上述内容修改为一次一个句子而不是提供文本文件?

1 个答案:

答案 0 :(得分:1)

你应该创建自己的SentenceParser,只返回你的字符串(你的一个句子一次&#39;)。

以下是示例代码

import LBJ2.parse.Parser;
import LBJ2.nlp.Sentence;

public class FakeSentenceSplitter implements Parser {

    private final String sentenceText;

    public FakeSentenceSplitter(String sentenceText) {
        super();
        this.sentenceText = sentenceText;
    }

    public Object next() {
        return new Sentence(sentenceText);
    }

    public void reset() {
    }

    public void close() {
    }
}

如果您还没有使用LBJ2套餐,可以下载here

之后你应该在这一行中使用你的新句子分割器:

Parser parser = new PlainToTokenParser(
        new WordSplitter(new FakeSentenceSplitter(filename)));