我试图在每个句子的基础上使用Illinois Chunker。以某种方式提供的入口点是以下代码段:
public class ChunksAndPOSTags {
public static void main(String[] args) {
String filename = null;
try {
filename = args[0];
if (args.length > 1) throw new Exception();
}
catch (Exception e) {
System.err.println("usage: java edu.illinois.cs.cogcomp.lbj.chunk.ChunksAndPOSTags <input file>");
System.exit(1);
}
Chunker chunker = new Chunker();
Parser parser = new PlainToTokenParser(
new WordSplitter(new SentenceSplitter(filename)));
String previous = "";
for (Word w = (Word) parser.next(); w != null; w = (Word) parser.next()) {
String prediction = chunker.discreteValue(w);
if (prediction.startsWith("B-") ||
prediction.startsWith("I-") &&
!previous.endsWith(prediction.substring(2)))
System.out.print("[" + prediction.substring(2) + " ");
System.out.print("(" + w.partOfSpeech + " " + w.form + ") ");
if (!prediction.equals("O") &&
(w.next == null ||
chunker.discreteValue(w.next).equals("O") ||
chunker.discreteValue(w.next).startsWith("B-") ||
!chunker.discreteValue(w.next).endsWith(prediction.substring(2))))
System.out.print("] ");
if (w.next == null)
System.out.println();
previous = prediction;
}
}
}
我们如何将上述内容修改为一次一个句子而不是提供文本文件?
答案 0 :(得分:1)
你应该创建自己的SentenceParser,只返回你的字符串(你的一个句子一次&#39;)。
以下是示例代码
import LBJ2.parse.Parser;
import LBJ2.nlp.Sentence;
public class FakeSentenceSplitter implements Parser {
private final String sentenceText;
public FakeSentenceSplitter(String sentenceText) {
super();
this.sentenceText = sentenceText;
}
public Object next() {
return new Sentence(sentenceText);
}
public void reset() {
}
public void close() {
}
}
如果您还没有使用LBJ2套餐,可以下载here。
之后你应该在这一行中使用你的新句子分割器:
Parser parser = new PlainToTokenParser(
new WordSplitter(new FakeSentenceSplitter(filename)));