我想为印度名字训练一个语料库:
class NameTraining
{
public static void TrainNames() throws IOException
{
Charset charset = Charset.forName("UTF-8");
FileReader fileReader = new FileReader("train.txt");
ObjectStream fileStream = new PlainTextByLineStream(fileReader);
ObjectStream sampleStream = new NameSampleDataStream(fileStream);
TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
NameFinderME nfm = new NameFinderME(model);
}
public static void main(String args[]) throws IOException
{
NameTraining det = new NameTraining();
det.TrainNames();
}
}
我使用以下命令编译:
javac -cp $(echo lib/*.jar | tr ' ' ':') NameTraining.java -Xlint:unchecked
但是我收到这些错误消息
NameTraining.java:35: warning: [unchecked] unchecked conversion
found : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<java.lang.String>
ObjectStream sampleStream = new NameSampleDataStream(fileStream);
^
NameTraining.java:36: warning: [unchecked] unchecked conversion
found : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
^
2 warnings
我想知道两件事
答案 0 :(得分:2)
您好我获得了一个简短的成功培训数据集
public static void TrainNames() throws IOException
{
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =new PlainTextByLineStream(new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-ner-person.train"), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
//FileReader fileReader = new FileReader("train.txt");
//ObjectStream fileStream = new PlainTextByLineStream(fileReader);
//ObjectStream sampleStream = new NameSampleDataStream(fileStream);
TokenNameFinderModel model = NameFinderME.train("en", "person", sampleStream, Collections.<String, Object>emptyMap());
NameFinderME nfm = new NameFinderME(model);
String sentence = "";
BufferedReader br = new BufferedReader(new FileReader("/home/yogi.singh/dev/java/nlp/train.txt"));
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append('\n');
line = br.readLine();
}
sentence = sb.toString();
}
finally
{
br.close();
}
InputStream is1 = new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-token.bin");
TokenizerModel model1 = new TokenizerModel(is1);
Tokenizer tokenizer = new TokenizerME(model1);
String tokens[] = tokenizer.tokenize(sentence);
for (String a : tokens)
System.out.println(a);
Span nameSpans[] = nfm.find(tokens);
for(Span s: nameSpans)
{
System.out.print(s.toString());
System.out.print(" ");
for(int index = s.getStart();index < s.getEnd();index++)
{
System.out.print(tokens[index] + " ");
}
System.out.println(" ");
}
}
答案 1 :(得分:0)
警告与使用Java generics而非OpenNLP有关。
试试这个:
ObjectStream<String> fileStream = new PlainTextByLineStream(fileReader);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(fileStream);