如何在OpenNLP中训练名称模型?

时间:2015-07-20 14:22:41

标签: java machine-learning nlp opennlp named-entity-recognition

我试图训练名字查找器模型以检测名称,但它没有给出正确的结果。 这是代码

protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        InputStream is=null;
        Resources resources=this.getResources();
        assetManager=resources.getAssets();

        String trainingDataFile = "en-ner-person.train";
        String outputModelFile = "en-ner-person.bin";
        String sentence[] = {"Sunil", "61 years old , will join the board as a nonexecutive director Nov. 29" };

        train(trainingDataFile, outputModelFile, "person");
        try {
            predict(sentence, outputModelFile);
        }
        catch(Exception e)
        {
            System.out.println("Errror Preditct" + e.getMessage());
        }
    }

    private static void train(String trainingDataFile, String outputModelFile, String tagToFind) {

        NameSampleDataStream nss = null;
        try {
            nss = new NameSampleDataStream(new PlainTextByLineStream(new java.io.FileReader(trainingDataFile)));
        } catch (Exception e) {}

        TokenNameFinderModel model = null;

        try {

            model = NameFinderME.train("en", tagToFind, nss, Collections.<String, Object>emptyMap());
        } catch(Exception e) {}

        try {
            File outFile = new File(outputModelFile);
            FileOutputStream outFileStream = new FileOutputStream(outFile);
            model.serialize(outFileStream);
        }
        catch (Exception ex) {}
    }

    private  void predict(String sentence[], String modelFile) throws Exception {
        InputStream is1 ;
        is1 = assetManager.open("en-ner-person.bin",MODE_PRIVATE);

        TokenNameFinderModel model1 = new TokenNameFinderModel(is1);

        String sd;
        NameFinderME nameFinder = new NameFinderME(model1);
        Span sp[] = nameFinder.find(sentence);

        String a[] = Span.spansToStrings(sp, sentence);
        StringBuilder fd = new StringBuilder();
        int l = a.length;

        for (int j = 0; j < l; j++) {
            fd = fd.append(a[j] + "\n");

        }
        sd = fd.toString();
        Log.d("Name Detected:", sd);

    }



}

以下是输出

D:\名称检测到:[07-20 19:35:47.516 8799:8799 I / Adreno-EGL]

en-ner-person.train 的内容是:

<START:person> Sunil <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .

请帮助。

1 个答案:

答案 0 :(得分:0)

试试这个:

 Span[] sp = nameFinder.find(search);
 nameFinder.clearAdaptiveData();

 for (Span span : sp) {
   for (int i=span.getStart(); i<span.getEnd(); i++) {
     fd.append(sentence[i] + "\n");
   }
 }

此行似乎也是错误的,您不需要再次将附加内容分配给fd

fd = fd.append(a[j] + "\n");

希望有所帮助。