Question

我有一个包含三列的数据帧（df），如下所示：

结构：

id id1 age
A1 a1  32
A1 a2  45
A1 a3  45
A1 a4  12
A2 b1  15
A2 b5  34
A2 b64 17

预期输出：

id count count1
A1 4     1
A2 3     2

逻辑：

列“count”是“id”重复的次数
列“count1”是年龄小于21的行数

当前代码：

library(dplyr)
df_summarized <- df %>% 
                     group_by(id) >%> 
                     summarise(count = n(),count1 = count(age<21))

问题：

Error: no applicable method for 'group_by_' applied to an object of class "logical"

Answer 1

我们需要执行df %>% group_by(id) %>% summarise(count = n(),count1 = sum(age < 21)) # A tibble: 2 × 3 # id count count1 # <chr> <int> <int> #1 A1 4 1 #2 A2 3 2

count

data.frame适用于tbl_df或summarise，而不是data.table

内的单个列

或使用library(data.table) setDT(df)[, .(count = .N, count1 = sum(age < 21)), id]

base R

或cbind(count = rowSums(table(df[-2])), count1 = as.vector(rowsum(+(df$age < 21), df$id))) # count count1 #A1 4 1 #A2 3 2

aggregate

或根据sum

使用

do.call(data.frame, aggregate(age~id, df, FUN =
            function(x) c(count = length(x), count1 = sum(x<21))))

aggregate

注意：以上所有方法都为数据集提供了适当的列。这将在do.call(data.frame中特别注明。这就是输出列即矩阵被转换为具有{{1}}

的适当列的原因

Answer 2

使用基数R，我们可以使用aggregate查找每个组的行数（id）以及值小于21的行数

aggregate(age~id, df, function(x) c(count = length(x), 
                                                   count1 = length(x[x  < 21])))

#  id age.count age.count1
#1 A1         4          1
#2 A2         3          2

数据框中的条件计数

2 个答案: