计算值在R

时间:2019-10-24 20:20:09

标签: r

我有一个数据集,看起来像下面的R:找到了类似Counting number of times a value occurs的类似帖子,但并不完全相同。

id <-     c(1,1,1, 2,2,2, 3,3,3,3)
cat.1 <-  c("a","a","a","b","b","b","c","c","c","c")
cat.2 <-  c("m","m","m","f","f","f","m","m","m","m")
score <-    c(-1,0,-1, 1,0,1, -1,0,1,1)


data <- data.frame("id"=id, "cat.1"=cat.1, "cat.2"=cat.2, "score"=score)
data
   id cat.1 cat.2 score
1   1     a     m    -1
2   1     a     m     0
3   1     a     m    -1
4   2     b     f     1
5   2     b     f     0
6   2     b     f     1
7   3     c     m    -1
8   3     c     m     0
9   3     c     m     1
10  3     c     m     1

我想在每个ID的-1变量中计算score个值的数量。另外,我想保留cat.1cat.2变量。所需的输出将是:

   id cat.1 cat.2 count(-1)
1   1     a     m    2
2   2     b     f    0
3   3     c     m    1

您有什么建议吗? 谢谢!

4 个答案:

答案 0 :(得分:5)

这是我们可以使用dplyr进行的操作

data %>%
    group_by(id, cat.1, cat.2) %>% # or: group_by_at(vars(-score))
    summarise(count_neg_1 = sum(score == -1))


#      id cat.1 cat.2 count_neg_1
# 1     1 a     m               2
# 2     2 b     f               0
# 3     3 c     m               1

如果需要,可以更改计算列的名称。我通常避免在变量名中使用字母,数字或下划线以外的任何内容。

答案 1 :(得分:4)

一种base R可能是:

aggregate(score ~ ., FUN = function(x) sum(x == -1), data = data)

  id cat.1 cat.2 score
1  2     b     f     0
2  1     a     m     2
3  3     c     m     1

如果您的数据中有更多变量,并且只想将这三个变量分组,则可以通过aggregate(score ~ id + cat.1 + cat.2, ...)

明确指定它

答案 2 :(得分:4)

library(data.table)
setDT(data)[ , sum(score == -1), by=c('id', 'cat.1', 'cat.2')]
#    id cat.1 cat.2 V1
# 1:  1     a     m  2
# 2:  2     b     f  0
# 3:  3     c     m  1

答案 3 :(得分:0)

另一个选项是count

library(dplyr)
data %>%
   mutate(score = score == -1) %>% 
   dplyr::count(id, cat.1, cat.2, wt = score)
# A tibble: 3 x 4
#    id cat.1 cat.2     n
#   <dbl> <fct> <fct> <int>
#1     1 a     m         2
#2     2 b     f         0
#3     3 c     m         1