在数据框的组内添加具有特定值数量的列

时间:2020-09-21 13:06:03

标签: r dataframe

我有一个数据框:

ID       SUB_ID     Action  
1         A          Open
1         A          Download
1         A          Close
1         B          Open
1         B          Search
1         B          Download
1         B          Close
2         AA          Open
2         AA          Download
2         AA          Close
2         BB          Open
2         BB          Search
2         BB          Filter
2         BB          Close
3         C           Open
3         C           Search
3         C           Filter
3         C           Close

我想在一个SUB_ID内的“操作”列中获取具有ID和每个ID的SUB_ID数量以及“下载”数量的表。因此,期望的结果是:

ID       SUB_ID_n     Download_n 
1         2            2
2         2            1
3         1            0

我该怎么办?

1 个答案:

答案 0 :(得分:0)

使用n_distinct计算唯一值的数量,并对逻辑值求和以使用Action == 'Download'计算行。

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(SUB_ID_n = n_distinct(SUB_ID, na.rm = TRUE), 
            Download_n = sum(Action == 'Download'))

#    ID SUB_ID_n Download_n
#  <int>    <int>      <int>
#1     1        2          2
#2     2        2          1
#3     3        1          0

data.table中,可以这样写:

library(data.table)
setDT(df)[, .(SUB_ID_n = uniqueN(SUB_ID, na.rm = TRUE), 
              Download_n = sum(Action == 'Download')), ID]

数据

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), SUB_ID = c("A", "A", "A", "B", 
"B", "B", "B", "AA", "AA", "AA", "BB", "BB", "BB", "BB", "C", 
"C", "C", "C"), Action = c("Open", "Download", "Close", "Open", 
"Search", "Download", "Close", "Open", "Download", "Close", "Open", 
"Search", "Filter", "Close", "Open", "Search", "Filter", "Close"
)), class = "data.frame", row.names = c(NA, -18L))