想象一下,下表名为DT
ID Path Status
AA XXX Completed
AB XXX Completed
AC XXX In progress
AD XYY Completed
AE XYY In progress
我想按路径将此表分组,并计算(1)唯一ID的数量和(2)状态为“已完成”的唯一ID的数量(原始表DT中没有重复的ID)>
我尝试了以下代码:
DT_Grouped <- DT %>%
group_by(Path) %>%
summarise(CountComplete = sum(DT$Status == "Completed"), Count=n())
这将产生以下结果:
Path CountComplete Count
XXX 3 3
XYY 3 2
CountComplete始终给出状态为完成的唯一ID的总数;没有按路径分组。逻辑上是合理的,因为计算是引用原始表而不是分组的数据集。
我应该如何修改代码以使CountComplete根据Path分组?
预先感谢您的帮助。
答案 0 :(得分:1)
原因是我们获得的是DT$
而不是每个组中“状态”值的完整数据集列
sum(DT$Status == "Completed")
^^^^
应该是
library(dplyr)
DT_Grouped <- DT %>%
group_by(Path) %>%
summarise(CountComplete = sum(Status == "Completed"), Count=n())
DT_Grouped
# A tibble: 2 x 3
# Path CountComplete Count
# <chr> <int> <int>
#1 XXX 2 3
#2 XYY 1 2
如果它是data.table
,则对应的方法将是
library(data.table)
setDT(DT)[, .(CountComplete = sum(Status == "Completed"), Count = .N), by = Path]
DT <- structure(list(ID = c("AA", "AB", "AC", "AD", "AE"), Path = c("XXX",
"XXX", "XXX", "XYY", "XYY"), Status = c("Completed", "Completed",
"In progress", "Completed", "In progress")),
class = "data.frame", row.names = c(NA,
-5L))