我尝试使用openNlp训练NER的自定义模型。当我传递一个句子预测实体时,它只选择句子的第一个单词。不知道我哪里错了,。
请在下面找到培训模型代码,
public class OpenNLPNER {
public static void main(String[] args) {
train("en", "technology", "D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\technology.train", "D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\techno1.bin");
}
public static String train(String lang, String entity, InputStreamFactory inputStream, FileOutputStream modelStream) {
Charset charset = Charset.forName("UTF-8");
TokenNameFinderModel model = null;
ObjectStream<NameSample> sampleStream = null;
try {
ObjectStream<String> lineStream = new PlainTextByLineStream(inputStream, charset);
sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderFactory nameFinderFactory = new TokenNameFinderFactory();
model = NameFinderME.train("en", "technology", sampleStream, TrainingParameters.defaultParams(),
nameFinderFactory);
} catch (FileNotFoundException fio) {
} catch (IOException io) {
} finally {
try {
sampleStream.close();
} catch (IOException io) {
}
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(modelStream);
model.serialize(modelOut);
} catch (IOException io) {
} finally {
if (modelOut != null) {
try {
modelOut.close();
} catch (IOException io) {
}
}
}
return "Something goes wrong with training module.";
}
public static String train(String lang, String entity, String taggedCoprusFile,
String modelFile) {
try {
InputStreamFactory inputStream = new InputStreamFactory() {
FileInputStream fileInputStream = new FileInputStream("D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\technology.train");
public InputStream createInputStream() throws IOException {
return fileInputStream;
}
};
// InputStreamFactory temp= new InputStream("D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\en-ner-medical.train") ;
return train(lang, entity, inputStream,
new FileOutputStream(modelFile));
} catch (Exception e) {
e.printStackTrace();
}
return "Something goes wrong with training module.";
}
}
现在加载保存的模型, 当我传递一个句子来预测输出时,它只选择第一个单词,并且只有当第一个单词的第一个字母是大写时。
找到负载模型并预测下面的代码,
public class nameEntity {
public static void main(String[] args) throws Exception {
InputStream modelIn = new FileInputStream( "D:/main/techno.bin");
InputStream tokenModelIn = new FileInputStream( "C:/openNLP/en-
token.bin");
try {
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
//Instantiating the NameFinder class
//nameFinder = new NameFinderME(model);
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
//Instantiating the TokenizerME class
TokenizerME tokenizer = new TokenizerME(tokenModel);
//Getting the sentence in the form of String array
String sentence = "Camel is a Java software";
String tokens[] = tokenizer.tokenize(sentence);
//Finding the names in the sentence
nameFinder.clearAdaptiveData();
Span nameSpans[] = nameFinder.find(tokens);
System.out.println(sentence);
//Printing the spans of the names in the sentence
for(Span s: nameSpans) {
System.out.println(s.toString()+" "+tokens[s.getStart()]);
}
}
}
列车档案:
Abdera实施Atom Syndication Format和Atom Publishing Protocol,Accumulo安全实现BigTable,支持不同通信协议和客户端的ActiveMQ消息代理,包括完整的Java消息服务(JMS)1.1客户端。 Allura基于Python的软件伪造的开源实现。基于Ant Java的构建工具,Apache Arrow&#34;用于柱状内存分析的高性能跨系统数据层&#34;。 APR Apache Portable Runtime,一个用C编写的可移植性库,Archiva Build Artifact Repository Manager,Apache Beam,一个用于大数据的超级API Beehive Java可视对象模型。基于Trac的Bloodhound缺陷跟踪器[3]。方解析动态数据管理框架,Camel声明性路由和中介规则引擎,它使用基于Java的域特定语言实现企业集成模式。
输出当第一个字母的第一个单词为大写时: Camel是Java软件吗? [0..1]技术是
输出当第一个字母的第一个单词不是大写字母时: camel是一个Java软件
现在这里发生的是,如果在火车文件中找到第一个单词。 输出是句子的第一个单词iff第一个单词是否为大写字母。
尝试使用openNlp工具1.6.0&amp; 1.7.2版本训练模型。
请告诉我,问题出在哪里?我错过了任何规则吗?
提前致谢。