使用twitteR的Twitter情感分析,所有分数都为零?

时间:2016-12-20 13:30:21

标签: r twitter sentiment-analysis

我是twitteR的Twitter感伤分析的新手,并使用了胡和刘的positive.txt和negative.txt。我很高兴一切顺利但超过1000条推文的分数都变成了中立(得分= 0)?我无法弄清楚出了什么问题,非常感谢任何帮助!

    setup_twitter_oauth(consumer_key, consumer_secret, token, token_secret)

    #Get tweets about "House of Cards", due to the limitation, we'll set n=1500
    netflix.tweets<- searchTwitter("#HouseofCards",n=1500)
    tweet=netflix.tweets[[1]]
    tweet$getScreenName()
    tweet$getText()
    netflix.text=laply(netflix.tweets,function(t)t$getText())
    head(netflix.text) 
    write(netflix.text, "HouseofCards_Tweets.txt", ncolumns = 1)

    #loaded the positive and negative.txt from Hu and Liu
    positive <- scan("/users/xxx/desktop/positive_words.txt", what = character(), comment.char = ";")
    negative <- scan("/users/xxx/desktop/negative_words.txt", what = character(), comment.char = ";")

    #add positive words 
    pos.words =c(positive,"miss","Congratulations","approve","watching","enlightening","killing","solid")

    scoredsentiment <- function(hoc.vec, pos.word, neagtive)
    {
        clean <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "",hoc.vec)
        clean <- gsub("^\\s+|\\s+$", "", clean) 
        clean <- gsub("[[:punct:]]", "", clean)
        clean <- gsub("[^[:graph:]]", "", clean)
        clean <- gsub("[[:cntrl:]]", "", clean)
        clean <- gsub("@\\w+", "", clean)
        clean <- gsub("\\d+", "", clean) 
        clean <- tolower(clean)

        hoc.list <- strsplit(clean, "") 
        hoc=unlist(hoc.list)

        pos.matches = match(hoc, pos.words)
        scoredpositive <- sapply(hoc.list, function(x) sum(!is.na(match(pos.matches, positive))))  
        scorednegative <- sapply(hoc.list, function(x) sum(!is.na(match(x, negative))))
        hoc.df <- data.frame(score = scoredpositive - scorednegative, message = hoc.vec, stringsAsFactors = F)
        return (hoc.df)
    }

    twitter_scores <- scoredsentiment(netflix.text, scoredpositive, scorednegative)
    print(twitter_scores)
    write.csv(twitter_scores, file=paste('twitter_scores.csv'), row.names=TRUE)

    #draw a graph to show the final outcome
    hist(twitter_scores$score)
    qplot(twitter_scores$score)

一切正常,但每条推文的得分相同(得分= 0)

2 个答案:

答案 0 :(得分:1)

您可以使用Microsoft Cognitive Services计算情感分数。 Microsoft Cognitive Services(Text Analytics API)API可以检测文本中的情绪,关键短语,主题和语言。

请参阅此链接以使用R link

中的Microsoft认知服务

Sentimental Analysis in R

答案 1 :(得分:0)

从您的代码中,我认为简单匹配不起作用。您需要使用某种形式的模糊匹配方案。通过匹配,您需要重复的确切单词,这不会发生很多事情,而且,您将单个单词与一串单词匹配。