我在Java Eclipse中创建一个工具,可以区分句子是否包含特定单词。
我正在使用twitter4j工具在Twitter上搜索推文。
我使用了stanford NLP标记器来标记来自twitter的推文。然后将其存储在文本文件中。
这是代码
public class TextTag {
public static void main(String[] args) throws IOException,
ClassNotFoundException {
String tagged;
// Initialize the tagger
MaxentTagger tagger = new MaxentTagger("taggers/english-left3words-distsim.tagger");
// The sample string
String sample = "Output Tagged";
//The tagged string
tagged = tagger.tagString(sample);
//output the tagged sample string onto your console
//System.out.println(tagged);
/*pick up some sentences from the file ouput.txt and store the output of
tagged sentences in another file EntityTagged.txt. */
FileInputStream fstream = new FileInputStream("Output.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
//we will now pick up sentences line by line from the file ouput.txt and store it in the string sample
while((sample = br.readLine())!=null)
{
//tag the string
tagged = tagger.tagString(sample);
FileWriter q = new FileWriter("EntityTagged.txt",true);
BufferedWriter out =new BufferedWriter(q);
//write it to the file EntityTagged.txt
out.write(tagged);
out.newLine();
out.close();
}
我的下一步是使用来自EntityTagged.txt的已标记推文,并将这些推文与一系列正面词和否定词进行比较。
我创建了2个文本文件,一个正面词汇列表和一个否定词汇列表,我的目标是遍历“EntityTagged.txt”中的10个不同标记的推文"针对positive.txt和negative.txt文件提交文件以查明是否出现了一个单词,以便我可以区分推文是正面的还是负面的
我的最终结果应该是
鸣叫1:积极 鸣叫2:否定 鸣叫3:否定
等
目前,我正在努力创建一个可以实现此
的算法非常感谢任何帮助
谢谢
答案 0 :(得分:0)
这是我的五分钟算法。将正面和负面单词存储为分隔字符串。然后遍历推文中的单词,看看它们是否存在于分隔的字符串中。您必须扩展拆分正则表达式以包含所有特殊字符:
String positiveWords = "|nice|happy|great|";
positiveWords = positiveWords.toLowerCase();
String negativeWords = "|bad|awful|mean|yuck|sad|";
negativeWords = negativeWords.toLowerCase();
String tweetOne = "nice day happy not sad at all";
tweetOne = tweetOne.toLowerCase();
String[] arrWords = tweetOne.split("\\s");
int value = 0;
for (int i=0; i < arrWords.length; i++) {
if (positiveWords.indexOf("|"+arrWords[i]+"|") != -1) {
System.out.println("POS word(+1): " + arrWords[i]);
value++;
}
if (negativeWords.indexOf("|"+arrWords[i]+"|") != -1) {
System.out.println("NEG word(-1): " + arrWords[i]);
value--;
}
}
System.out.println("positive/negative value: " + value);