用户完成数字步骤后,列is_digitally_signed
变为YES
。
我正在尝试做的是:如果数字完成了任何步骤,我想检索相同的application_id
和user_id
的所有行。请检查下面我想要的输出。
用于复制我的数据集的R代码
df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3),
user_id = c(123,123,123,456,456,456,789,789,789),
application_status = c("incomplete", "details_verified", "complete"),
date = c("01/01/2018", "02/01/2018", "03/01/2018"),
is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
mutate(date = as.Date(date, "%d/%m/%Y"))
有输出
df
application_id user_id application_status date is_digitally_signed
1 123 incomplete 2018-01-01 NULL
1 123 details_verified 2018-01-02 NULL
1 123 complete 2018-01-03 YES
2 456 incomplete 2018-01-01 NULL
2 456 details_verified 2018-01-02 NULL
2 456 complete 2018-01-03 NULL
3 789 incomplete 2018-01-01 NULL
3 789 details_verified 2018-01-02 YES
3 789 complete 2018-01-03 NULL
我的(失败的)努力
df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))
所需结果
application_id user_id application_status date is_digitally_signed
1 123 incomplete 2018-01-01 NULL
1 123 details_verified 2018-01-02 NULL
1 123 complete 2018-01-03 YES
3 789 incomplete 2018-01-01 NULL
3 789 details_verified 2018-01-02 YES
3 789 complete 2018-01-03 NULL
答案 0 :(得分:3)
我们可以将filter
与any
一起使用,这将检查给定的组是否至少有一条is_digitally_signed == 'YES'
记录:
library(dplyr)
df %>%
group_by(application_id, user_id) %>%
filter(any(is_digitally_signed == "YES"))
或使用all
函数对不是所有is_digitally_signed == "NULL"
的组进行子集化:
df %>%
group_by(application_id, user_id) %>%
filter(!all(is_digitally_signed == "NULL"))
由于您已经将数据作为DT加载,因此我们也可以使用data.table
:
library(data.table)
dt = setDT(df)
dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]
或使用.SD
:
dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]
输出:
# A tibble: 6 x 5
# Groups: application_id, user_id [2]
application_id user_id application_status date is_digitally_signed
<dbl> <dbl> <fct> <date> <fct>
1 1 123 incomplete 2018-01-01 NULL
2 1 123 details_verified 2018-01-02 NULL
3 1 123 complete 2018-01-03 YES
4 3 789 incomplete 2018-01-01 NULL
5 3 789 details_verified 2018-01-02 YES
6 3 789 complete 2018-01-03 NULL
答案 1 :(得分:3)
由于只有一列要测试,因此我们可以简单地将filter
与any
一起使用
library(dplyr)
df %>%
group_by(application_id,user_id) %>%
filter(any(is_digitally_signed == "YES"))
# A tibble: 6 x 5
# Groups: application_id, user_id [2]
# application_id user_id application_status date is_digitally_signed
# <dbl> <dbl> <chr> <date> <chr>
#1 1 123 incomplete 2018-01-01 NULL
#2 1 123 details_verified 2018-01-02 NULL
#3 1 123 complete 2018-01-03 YES
#4 3 789 incomplete 2018-01-01 NULL
#5 3 789 details_verified 2018-01-02 YES
#6 3 789 complete 2018-01-03 NULL
或者另一个选择是使用%in%
返回单个TRUE/FALSE
输出,该输出将被回收
df %>%
group_by(application_id,user_id) %>%
filter("YES" %in% is_digitally_signed)
或者我们可以使用base R
df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]