使用OpenNLP进行情感分析

时间:2017-06-27 13:07:05

标签: machine-learning sentiment-analysis opennlp

我找到了使用OpenNLP实现情感分析任务的描述。就我而言,我使用的是最新的OPenNLP版本,即版本1.8.0。在以下示例中,它们使用最大熵模型。我使用相同的input.txt(tweets.txt)

http://technobium.com/sentiment-analysis-using-opennlp-document-categorizer/

public class StartSentiment {

public static DoccatModel model = null;
public static String[] analyzedTexts = {"I hate Mondays!"/*, "Electricity outage, this is a nightmare"/*, "I love it"*/};

public static void main(String[] args) throws IOException {


 //     begin of sentiment analysis
    trainModel();
    for(int i=0; i<analyzedTexts.length;i++){
        classifyNewText(analyzedTexts[i]);
    }

}

  private static String readFile(String pathname) throws IOException {

        File file = new File(pathname);
        StringBuilder fileContents = new StringBuilder((int)file.length());
        Scanner scanner = new Scanner(file);
        String lineSeparator = System.getProperty("line.separator");

        try {
            while(scanner.hasNextLine()) {
                fileContents.append(scanner.nextLine() + lineSeparator);
            }
            return fileContents.toString();
        } finally {
            scanner.close();
        }
    }

  public static void trainModel() {
      MarkableFileInputStreamFactory  dataIn = null;
     try {
        dataIn = new MarkableFileInputStreamFactory(
                new File("bin/text.txt"));

        ObjectStream<String> lineStream = null;
        lineStream = new PlainTextByLineStream(dataIn, StandardCharsets.UTF_8);
        ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);

        TrainingParameters tp = new TrainingParameters();
        tp.put(TrainingParameters.CUTOFF_PARAM, "2");
        tp.put(TrainingParameters.ITERATIONS_PARAM, "30");

        DoccatFactory df = new DoccatFactory();
        model = DocumentCategorizerME.train("en", sampleStream, tp, df);

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (dataIn != null) {
            try {
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }
    }
  }

  public static void classifyNewText(String text){
      DocumentCategorizerME myCategorizer = new DocumentCategorizerME(model);

          double[] outcomes = myCategorizer.categorize(new String[]{text});
          String category = myCategorizer.getBestCategory(outcomes);

          if (category.equalsIgnoreCase("1")){
              System.out.print("The text is positive");
          } else {
              System.out.print("The text is negative");
          }

  }

}

在我的情况下,无论我使用什么输入字符串,我只得到输入字符串的正面估计。任何想法可能是什么原因?

由于

0 个答案:

没有答案