我有一个数据框:
ID SUB_ID Action
1 A Open
1 A Download
1 A Close
1 B Open
1 B Search
1 B Download
1 B Close
2 AA Open
2 AA Download
2 AA Close
2 BB Open
2 BB Search
2 BB Filter
2 BB Close
3 C Open
3 C Search
3 C Filter
3 C Close
我想在一个SUB_ID内的“操作”列中获取具有ID和每个ID的SUB_ID数量以及“下载”数量的表。因此,期望的结果是:
ID SUB_ID_n Download_n
1 2 2
2 2 1
3 1 0
我该怎么办?
答案 0 :(得分:0)
使用n_distinct
计算唯一值的数量,并对逻辑值求和以使用Action == 'Download'
计算行。
library(dplyr)
df %>%
group_by(ID) %>%
summarise(SUB_ID_n = n_distinct(SUB_ID, na.rm = TRUE),
Download_n = sum(Action == 'Download'))
# ID SUB_ID_n Download_n
# <int> <int> <int>
#1 1 2 2
#2 2 2 1
#3 3 1 0
在data.table
中,可以这样写:
library(data.table)
setDT(df)[, .(SUB_ID_n = uniqueN(SUB_ID, na.rm = TRUE),
Download_n = sum(Action == 'Download')), ID]
数据
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), SUB_ID = c("A", "A", "A", "B",
"B", "B", "B", "AA", "AA", "AA", "BB", "BB", "BB", "BB", "C",
"C", "C", "C"), Action = c("Open", "Download", "Close", "Open",
"Search", "Download", "Close", "Open", "Download", "Close", "Open",
"Search", "Filter", "Close", "Open", "Search", "Filter", "Close"
)), class = "data.frame", row.names = c(NA, -18L))