Question

我是twitteR的Twitter感伤分析的新手，并使用了胡和刘的positive.txt和negative.txt。我很高兴一切顺利但超过1000条推文的分数都变成了中立（得分= 0）？我无法弄清楚出了什么问题，非常感谢任何帮助！

    setup_twitter_oauth(consumer_key, consumer_secret, token, token_secret)

    #Get tweets about "House of Cards", due to the limitation, we'll set n=1500
    netflix.tweets<- searchTwitter("#HouseofCards",n=1500)
    tweet=netflix.tweets[[1]]
    tweet$getScreenName()
    tweet$getText()
    netflix.text=laply(netflix.tweets,function(t)t$getText())
    head(netflix.text) 
    write(netflix.text, "HouseofCards_Tweets.txt", ncolumns = 1)

    #loaded the positive and negative.txt from Hu and Liu
    positive <- scan("/users/xxx/desktop/positive_words.txt", what = character(), comment.char = ";")
    negative <- scan("/users/xxx/desktop/negative_words.txt", what = character(), comment.char = ";")

    #add positive words 
    pos.words =c(positive,"miss","Congratulations","approve","watching","enlightening","killing","solid")

    scoredsentiment <- function(hoc.vec, pos.word, neagtive)
    {
        clean <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "",hoc.vec)
        clean <- gsub("^\\s+|\\s+$", "", clean) 
        clean <- gsub("[[:punct:]]", "", clean)
        clean <- gsub("[^[:graph:]]", "", clean)
        clean <- gsub("[[:cntrl:]]", "", clean)
        clean <- gsub("@\\w+", "", clean)
        clean <- gsub("\\d+", "", clean) 
        clean <- tolower(clean)

        hoc.list <- strsplit(clean, "") 
        hoc=unlist(hoc.list)

        pos.matches = match(hoc, pos.words)
        scoredpositive <- sapply(hoc.list, function(x) sum(!is.na(match(pos.matches, positive))))  
        scorednegative <- sapply(hoc.list, function(x) sum(!is.na(match(x, negative))))
        hoc.df <- data.frame(score = scoredpositive - scorednegative, message = hoc.vec, stringsAsFactors = F)
        return (hoc.df)
    }

    twitter_scores <- scoredsentiment(netflix.text, scoredpositive, scorednegative)
    print(twitter_scores)
    write.csv(twitter_scores, file=paste('twitter_scores.csv'), row.names=TRUE)

    #draw a graph to show the final outcome
    hist(twitter_scores$score)
    qplot(twitter_scores$score)

一切正常，但每条推文的得分相同（得分= 0）

Answer 1

您可以使用Microsoft Cognitive Services计算情感分数。 Microsoft Cognitive Services（Text Analytics API）API可以检测文本中的情绪，关键短语，主题和语言。

请参阅此链接以使用R link

中的Microsoft认知服务

Sentimental Analysis in R

Answer 2

从您的代码中，我认为简单匹配不起作用。您需要使用某种形式的模糊匹配方案。通过匹配，您需要重复的确切单词，这不会发生很多事情，而且，您将单个单词与一串单词匹配。

使用twitteR的Twitter情感分析，所有分数都为零？

2 个答案: