我有一个数据集,看起来像下面的R
:找到了类似Counting number of times a value occurs的类似帖子,但并不完全相同。
id <- c(1,1,1, 2,2,2, 3,3,3,3)
cat.1 <- c("a","a","a","b","b","b","c","c","c","c")
cat.2 <- c("m","m","m","f","f","f","m","m","m","m")
score <- c(-1,0,-1, 1,0,1, -1,0,1,1)
data <- data.frame("id"=id, "cat.1"=cat.1, "cat.2"=cat.2, "score"=score)
data
id cat.1 cat.2 score
1 1 a m -1
2 1 a m 0
3 1 a m -1
4 2 b f 1
5 2 b f 0
6 2 b f 1
7 3 c m -1
8 3 c m 0
9 3 c m 1
10 3 c m 1
我想在每个ID的-1
变量中计算score
个值的数量。另外,我想保留cat.1
和cat.2
变量。所需的输出将是:
id cat.1 cat.2 count(-1)
1 1 a m 2
2 2 b f 0
3 3 c m 1
您有什么建议吗? 谢谢!
答案 0 :(得分:5)
这是我们可以使用dplyr
进行的操作
data %>%
group_by(id, cat.1, cat.2) %>% # or: group_by_at(vars(-score))
summarise(count_neg_1 = sum(score == -1))
# id cat.1 cat.2 count_neg_1
# 1 1 a m 2
# 2 2 b f 0
# 3 3 c m 1
如果需要,可以更改计算列的名称。我通常避免在变量名中使用字母,数字或下划线以外的任何内容。
答案 1 :(得分:4)
一种base R
可能是:
aggregate(score ~ ., FUN = function(x) sum(x == -1), data = data)
id cat.1 cat.2 score
1 2 b f 0
2 1 a m 2
3 3 c m 1
如果您的数据中有更多变量,并且只想将这三个变量分组,则可以通过aggregate(score ~ id + cat.1 + cat.2, ...)
答案 2 :(得分:4)
library(data.table)
setDT(data)[ , sum(score == -1), by=c('id', 'cat.1', 'cat.2')]
# id cat.1 cat.2 V1
# 1: 1 a m 2
# 2: 2 b f 0
# 3: 3 c m 1
答案 3 :(得分:0)
另一个选项是count
library(dplyr)
data %>%
mutate(score = score == -1) %>%
dplyr::count(id, cat.1, cat.2, wt = score)
# A tibble: 3 x 4
# id cat.1 cat.2 n
# <dbl> <fct> <fct> <int>
#1 1 a m 2
#2 2 b f 0
#3 3 c m 1