我们如何在一系列字符串上运行斯坦福分类器?

时间:2015-02-19 08:32:14

标签: stanford-nlp

我有一个字符串数组

String strarr[] = {
        "What a wonderful day",  
        "beautiful beds",
        "food was awesome"
    };

我也有训练有素的数据集

Room    What a beautiful room
Room    Wonderful sea-view
Room    beds are comfortable
Room    bed-spreads are good
Food    The dinner was marvellous
Food    Tasty foods
Service people are rude
Service waitors were not on time
Service service was horrible

以编程方式我无法获得我想要分类的字符串的分数和标签。 但是,如果我使用的是火车数据集,而且测试数据集中有两列,则可以使用。我的问题是,实际上,无法理解哪个标签属于我的数组中的每个字符串。

如何让分类器在数组上运行,而不是创建训练数据集?

我在尝试计算

时遇到错误
ColumnDataClassifier cdc = new ColumnDataClassifier("examples/drogo.prop");
        Classifier<String, String> cl
            = cdc.makeClassifier(cdc.readTrainingExamples("examples/drogo.train"));

        for (String li : strarr){
            Datum<String, String> d = cdc.makeDatumFromLine(li);

            System.out.println(li + "  ==>  " + cl.classOf(d) + " (score: " + cl.scoresOf(d) + ")");
        }

错误:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatum(ColumnDataClassifier.java:738)
    at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatumFromStrings(ColumnDataClassifier.java:275)
    at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatumFromLine(ColumnDataClassifier.java:245)
    at alchemypoc.DrogoClassifier.main(DrogoClassifier.java:55)
Java Result: 1

1 个答案:

答案 0 :(得分:0)

好的,所以我做了以下工作,现在看起来很有效。由于它是ColumnDataClassifier并且它以某种方式预期了柱状数据,我在每个句子之前添加了一个标签。

String strarr[] = {
            "\tWhat a wonderful day",
            "\tbeautiful beds",
            "\tfood was awesome"
        };

它现在给了我价值。

What a wonderful day  ==>  Room (score: {Service=-0.6692784244930884, Room=1.4113604761865859, Food=-0.7420810715491954})
    beautiful beds  ==>  Room (score: {Service=-2.1042147142001038, Room=3.888249805012589, Food=-1.7840358277259})
    food was awesome  ==>  Food (score: {Service=-0.44203328206155995, Room=-0.9779506257026013, Food=1.4199861760769543})

如果有人,有不同的答案或更正确的方法,请发布您的答案。