我看到每个句子的情绪评分非常好的R剧本,可在sentiment.R获得,我想知道,我怎么能取代这部分
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
用于将多个术语与具有多个术语的pos和neg词典进行匹配。我在下面有一个例子。
我有以下data.frame:
sent <- data.frame(words = c("just right size", "love this quality",
"good quality", "very good quality", "i hate this notebook",
"great improvement", "notebook is not good","notebook was"), user = c(1,2,3,4,5,6,7,8))
words user
1 just right size 1
2 love this quality 2
3 good quality 3
4 very good quality 4
5 i hate this notebook 5
6 great improvement 6
7 notebook is not good 7
8 notebook was 8
然后我用pos和neg词语进行词典化:
posWord <- c("great","improvement","love","great improvement","very good","good","right","very")
negWords <- c("hate","bad","not good","horrible")
所需的输出如下:
words user SentimentScore
1 just right size 1 1
2 love this quality 2 1
3 good quality 3 1
4 very good quality 4 1
5 i hate this notebook 5 -1
6 great improvement 6 1
7 notebook is not good 7 -1
8 notebook was 8 0
我应该如何在github上重写该代码以获得所需的输出。我的意思是,如果我在github上使用源代码,那么例如在第四行中,SentimentScore列中将是2而不是1。
请有人提出任何建议或类似解决方案。我很感激你的任何帮助。非常感谢你提前。
答案 0 :(得分:1)
我没看你提到的图书馆。 这可能就是你想要的。我用正面和负面的单词创建了一个数据框。我给他们分配了一个 - / + 1值。然后我给它们分配了一个长度值来排序。这样,首先使用最长的单词/短语。
sent <- data.frame(words = c("just right size", "love this quality",
"good quality", "very good quality", "i hate this notebook",
"great improvement", "notebook is not good"), user = c(1,2,3,4,5,6,7),
stringsAsFactors=F)
posWords <- c("great","improvement","love","great improvement","very good","good","right","very")
negWords <- c("hate","bad","not good","horrible")
wordsDF<- data.frame(words = posWords, value = 1,stringsAsFactors=F)
wordsDF<- rbind(wordsDF,data.frame(words = negWords, value = -1))
wordsDF$lengths<-unlist(lapply(wordsDF$words, nchar))
wordsDF<-wordsDF[ order(-wordsDF[,3]),]
scoreSentence <- function(sentence){
score<-0
for(x in 1:nrow(wordsDF)){
count<-length(grep(wordsDF[x,1],sentence))
if(count){
score<-score + (count * wordsDF[x,2])
sentence<-sub(wordsDF[x,1],'',sentence)
}
}
score
}
SentimentScore<- unlist(lapply(sent$words, scoreSentence))
cbind(sent, SentimentScore)
输出
words user SentimentScore
1 just right size 1 1
2 love this quality 2 1
3 good quality 3 1
4 very good quality 4 1
5 i hate this notebook 5 -1
6 great improvement 6 1
7 notebook is not good 7 -1