OpenNLP培训泰语

时间:2017-03-14 14:55:43

标签: java nlp opennlp

我正在尝试使用OpenNlp 1.7.2和maxent-3.0.0.jar来训练泰语,下面是读取泰国列车数据并创建bin模型的代码。

public class TrainPerson {
public static void main(String[] args) throws IOException {
    String trainFile = "/Documents/workspace/ThaiOpenNLP/bin/thaiPerson.train";
    String modelFile = "/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin"; 
    writePersonModel(trainFile, modelFile);

}
private static void writePersonModel(String trainFile, String modelFile)
        throws FileNotFoundException, IOException {

    Charset charset = Charset.forName("UTF-8");
    InputStreamFactory fileInputStream = new MarkableFileInputStreamFactory(new File(trainFile));
    ObjectStream<String> lineStream = new PlainTextByLineStream(fileInputStream, charset);
    ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
    TokenNameFinderModel model;

    try {
        model = NameFinderME.train("th", "person", sampleStream , TrainingParameters.defaultParams(), new TokenNameFinderFactory());
    } finally {
        sampleStream.close();
    }
    BufferedOutputStream modelOut = null;
    try {
        modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
        model.serialize(modelOut);

    } finally {
        if (modelOut != null) {
            modelOut.close();
        }
    }
}}

泰国数据看起来像文件trainingData

中的附件

我正在使用输出模型来检测人名,如下面的程序所示。它无法识别名称。

public class ThaiPersonNameFinder {

static String modelFile = "/Users/avinashpaula/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";

public static void main(String[] args) {

    try {
        InputStream modelIn = new FileInputStream(new File(modelFile));
      TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
      NameFinderME nameFinder = new NameFinderME(model);
      String sentence[] = new String[]{
                "จอห์น",
                "30",
                "ปี",
                "จะ",
                "เข้าร่วม",
                "ก",
                "เริ่มต้น",
                "ขึ้น",
                "บน",
                "มกราคม",
                "."
                };

    Span nameSpans[] = nameFinder.find(sentence);
    for (int i = 0; i < nameSpans.length; i++) {
        System.out.println(nameSpans[i]);
    }
    }
    catch (IOException e) {
      e.printStackTrace();
    }
}

}

我做错了什么。

0 个答案:

没有答案