我正在做一个关于情绪分析的项目。我需要字符串或形容词的语义方向,所以我建议使用Stackoverflow“如何使用SentiWordNet”的参考文章中的SentiWordNet_3.0.0。我运行了代码,但每次得到以下输出。
java.lang.ArrayIndexOutOfBoundsException: 2
at qtag.SWN3.(SWN3.java:29)
at qtag.SWN3.main(SWN3.java:105)
0.0
我已经使用不同的字符串作为输入运行代码,但结果是相同的。我已经删除了SentiWordNet_3.0.0_20130122.txt文件的第一部分或垃圾部分。我的代码有什么问题。我该怎么办呢?请帮我。谢谢。 这是我的代码:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Set;
import java.util.Vector;
public class SWN3 {
private String pathToSWN = "C:/Users/Monalisa/Desktop/SentiWordNet_3.0.0/home/swn/www/admin/dump/SentiWordNet_3.0.0_20130122.txt";
private HashMap<String, Double>_dict;
public SWN3(){
_dict = new HashMap<String, Double>();
HashMap<String, Vector<Double>> _temp = new HashMap<String, Vector<Double>>();
try{
BufferedReader csv = new BufferedReader(new FileReader(pathToSWN));
String line = "";
while((line = csv.readLine()) != null)
{
String[] data = line.split("\t");
Double score = Double.parseDouble(data[2])-Double.parseDouble(data[3]);
String[] words = data[4].split(" ");
for(String w:words)
{
String[] w_n = w.split("#");
w_n[0] += "#"+data[0];
int index = Integer.parseInt(w_n[1])-1;
if(_temp.containsKey(w_n[0]))
{
Vector<Double> v = _temp.get(w_n[0]);
if(index>v.size())
for(int i = v.size();i<index; i++)
v.add(0.0);
v.add(index, score);
_temp.put(w_n[0], v);
}
else
{
Vector<Double> v = new Vector<Double>();
for(int i = 0;i<index; i++)
v.add(0.0);
v.add(index, score);
_temp.put(w_n[0], v);
}
}
}
Set<String> temp = _temp.keySet();
for (Iterator<String> iterator = temp.iterator(); iterator.hasNext();) {
String word = iterator.next();
Vector<Double> v = _temp.get(word);
double score = 0.0;
double sum = 0.0;
for(int i = 0; i < v.size(); i++)
score += ((double)1/(double)(i+1))*v.get(i);
for(int i = 1; i<=v.size(); i++)
sum += (double)1/(double)i;
score /= sum;
String sent = "";
if(score>=0.75)
sent = "strong_positive";
else
if(score > 0.25 && score<=0.5)
sent = "positive";
else
if(score > 0 && score>=0.25)
sent = "weak_positive";
else
if(score < 0 && score>=-0.25)
sent = "weak_negative";
else
if(score < -0.25 && score>=-0.5)
sent = "negative";
else
if(score<=-0.75)
sent = "strong_negative";
_dict.put(word, score);
}
}
catch(Exception e){e.printStackTrace();}
}
public Double extract(String word)
{
Double total = new Double(0);
if(_dict.get(word+"#n") != null)
total = _dict.get(word+"#n") + total;
if(_dict.get(word+"#a") != null)
total = _dict.get(word+"#a") + total;
if(_dict.get(word+"#r") != null)
total = _dict.get(word+"#r") + total;
if(_dict.get(word+"#v") != null)
total = _dict.get(word+"#v") + total;
return total;
}
public static void main(String[] args) {
SWN3 test = new SWN3();
String sentence="what a super great day";
String[] words = sentence.split("\\s+");
double totalScore = 0;
for(String word : words) {
word = word.replaceAll("([^a-zA-Z\\s])", "");
if (test.extract(word) == null)
continue;
totalScore += test.extract(word);
}
System.out.println(totalScore);
}
}
答案 0 :(得分:0)
我希望你已经从sentiwordnet(sentiwordnet.isti.cnr.it)上的给定例子中获取了源代码。所以问题出在sentiwordnet_3.0.0_20130122.txt“文件......它应该从第一个形容词开始,所以删除所有的评论....最后有一些额外的空格删除它们。 所以简而言之,第一行的文件开头应该是
**a 00001740 0.125 0 able#1 (usually followed by to) having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was ........**
并且文件的最后一行应该是
**v 02772310 0.125 0 deflagrate#1 cause to burn rapidly and with great intensity; "care must be exercised when this substance is to be deflagrated"**
除此之外,我认为应该是错误,但要纠正它
private HashMap<String,Double> _dict ;