我想知道如何使用StanfordCoreNLP在段落中找到句子的开头和结尾位置。现在我正在使用DocumentPreprocessor将段落拆分成句子。是否有可能获得句子实际位于原始文本中的起点和终点索引?
我正在使用此处提出的另一个问题的代码。
String paragraph = "My 1st sentence. “Does it work for questions?” My third sentence.";
Reader reader = new StringReader(paragraph);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
List<String> sentenceList = new ArrayList<String>();
for (List<HasWord> sentence : dp) {
String sentenceString = Sentence.listToString(sentence);
sentenceList.add(sentenceString.toString());
}
for (String sentence : sentenceList) {
System.out.println(sentence);
}
取自:How can I split a text into sentences using the Stanford parser?
由于
答案 0 :(得分:2)
快速而肮脏的方法是:
import edu.stanford.nlp.simple.*;
Document doc = new Document("My 1st sentence. “Does it work for questions?” My third sentence.");
for (Sentence sentence : doc.sentences()) {
System.out.println(sentence.characterOffsetBegin(0) + " -- " + sentence.characterOffsetEnd(sentence.length() - 1));
}
否则,您可以从CoreLabel中提取CharacterOffsetBeginAnnotation
和CharacterOffsetEndAnnotation
,并使用它在原始文本中查找令牌的偏移量。
答案 1 :(得分:0)
有关获取CharacterOffsetEndAnnotation的示例,请参见https://www.programcreek.com/java-api-examples/?api=edu.stanford.nlp.ling.CoreLabel