我有一个包含文字评论的专栏,另一栏有评分:
Content Rating
"bluetooth is bad" 1
"head unit crashes" 2
"remote works awesome" 5
我想输入一组关键字,并根据不同的评分计算它们在评论中的出现次数。
简单地说,找出不同的人(评级定义队列)提到的最多。
Rating Word Count
1 bluetooth 1
1 head unit 0
1 remote 0
2 bluetooth 0
2 head unit 1
2 remote 0
5 bluetooth 0
5 head unit 0
5 remote 1
多年后我编码,坦率地说我正在尝试编写一个函数,但是我有太多的语法错误。
答案 0 :(得分:2)
我认为这就是我们追求的目标。我们可以调用一个函数来查找传递给它的单词的任何实例并计算评级数。
library(data.table)
dt = data.table(Content = c("bluetooth is bad", "head unit crashes", "remote workds awesome", "bluetooth is ok..."), Rating = c(1,2,5,3))
> dt
Content Rating
1: bluetooth is bad 1
2: head unit crashes 2
3: remote workds awesome 5
4: bluetooth is ok... 3
Count = function(word, dt){
dt = dt[grepl(word, Content, ignore.case = TRUE), .(Count = .N), by = .(Rating)]
dt[ , Content := word]
print(dt)
}
然后,我们可以查看蓝牙的计数
Count("bluetooth", dt)
Rating Count Content
1: 1 1 bluetooth
2: 3 1 bluetooth