我想在openNlp SentenceDetectorME中更改句末分隔符。我使用的是opennlp 1.5.3。 由于正常版本仅检测由'。'分隔的短语,因此我的目的是添加其他句子分隔符,例如';','!'和'?',将char数组eos []传递给SentenceDetectorFactory。我读过你必须使用.train方法SentenceDetectorME,但我不明白它是如何因为它是静态的并且需要一个火车模型。有什么建议吗?
我的代码:
import java.io.*;
import opennlp.tools.sentdetect.*;
public class SenTest {
public static void main(String[] args) throws IOException {
String paragraph = "12oz bottle poured into a tulip. Pleasing aromas of citrus rind, lemongrass, peaches, and toasted caramel are picked up from the start. After it settles a bit, more of a fresh baked bread crust and tangerine comes through, and even later, the bread crust turns more towards a blackened pizza crust. It pours a slightly hazy copper-orange color with a creamy white head that retains well; it leaves a thick puffy ring with a creamy island and a decent, messy lace along the glass. Great balance between medium high levels of sweet and bitter. The texture is creamy on the palate with a body towards the higher end of medium. The carbonation is a touch effervescent or fizzy, but overall, soft. There’s a very pronounced grapefruit tartness up front, but it mellows quickly after the first few sips. It finishes with a zesty combination of lemongrass, caramel, and stonefruit. The aftertaste is primarily sweet, overripe tangerines and it’s peel with a tart grapefruit bitter lingering in the mouth. Overall very refreshing, straddles the line between IPA and APA.";
char eos[] = {';', '.', '!', '?' };
int counter = 0;
// always start with a model, a model is learned from training data
InputStream is = new FileInputStream( System.getProperty( "user.dir" ) + "/lib/en-sent.bin" );
SentenceModel model = new SentenceModel( is );
SentenceDetectorME sdetector = new SentenceDetectorME( model );
String sentences[] = sdetector.sentDetect( paragraph );
for ( String s : sentences ) {
counter++;
System.out.println( "Frase numero " + counter + ": " + s );
}
is.close();
}
}
答案 0 :(得分:0)
我认为你误解了培训是如何运作的。
你需要提供大量的句子/段落,其中包含你想要检测的字符(!;)等。这是因为opennlp会检测句子中的特征,以确定它是真正的句子分裂,还是只是由于其他原因插入标点符号。
以下面的例子为例:
海伦三十岁;;年;;旧;她真的很年轻!
在这一行;;年;;只是一些标记,不应该被检测为句子分裂。 (出现的次数越多;;出现将决定是否分句)
在你的例子中,你也可以使用string.split()并根据输入的eos进行拆分,但这意味着你也将上面的句子分开;;模式也是如此。