Question

我想仅使用基数R对基于许多类别变量的实例进行计数。

样本数据：

create_sample <- function(){sample(LETTERS[1:3], size = 500, replace=T)}
df <- data.frame(
          x1 = create_sample(),
          x2 = create_sample(),
          x3 = create_sample(),
          x4 = create_sample()
        )

通常我会使用dplyr并执行以下操作：

df %>% 
  mutate(count = 1) %>% 
  group_by(x1, x2, x3, x4) %>% 
  summarise_all(funs(sum))

要获取我想要的数据帧输出：

# A tibble: 55 x 5
# Groups:   x1, x2, x3 [?]
   x1    x2    x3    x4    count
   <fct> <fct> <fct> <fct> <dbl>
 1 A     A     A     A      3.00
 2 A     A     B     A      1.00
 3 A     A     B     B      1.00
 4 A     A     B     C      2.00
 5 A     A     C     B      1.00
 6 A     B     A     A      3.00
 7 A     B     A     B      2.00
 8 A     B     A     C      1.00

但是现在我不得不使用基数R进行数据操作了。我尝试过的一个选项是：

as.data.frame(table(df$x1, df$x2, df$x3, df$x4))

除了输出太大而无法计算所有0种情况之外，它开始运行很长时间甚至在我有更大数据时崩溃R

是否存在合理的基本R方法来处理多个组？

Answer 1

我们可以在count中使用dplyr

df %>% 
   count(!!!rlang::syms(names(.)))

在base R中，我们使用aggregate

aggregate(n~ ., transform(df, n = 1), sum)

或

aggregate(cbind(n = rep(1, nrow(df)))~ ., df, sum)

基数R的多组计数方法

1 个答案: