对于user_status_1
和user_status_2
和application_status=='complete'
的不同组合,我创建了一个最终状态,即final_status
。我想将相同的final_status
应用于具有相同的application_id
和user_id
的所有行。请检查下方是否有所需的结果。
我的数据集
library(data.table)
library(dplyr)
df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3),
user_id = c(123,123,123,456,456,456,789,789,789),
date = c("01/01/2018", "02/01/2018", "03/01/2018"),
application_status = c("incomplete", "details_verified", "complete"),
user_status_1 = c("x", "y", "z", "x", "y", "z", "x", "y", "z"),
user_status_2 = c("a","b", "c", "d", "e", "f", "g", "h", "i")) %>%
mutate(date = as.Date(date, "%d/%m/%Y"))
有结果
application_id user_id date application_status user_status_1 user_status_2
1 123 2018-01-01 incomplete x a
1 123 2018-01-02 details_verified y b
1 123 2018-01-03 complete z c
2 456 2018-01-01 incomplete x d
2 456 2018-01-02 details_verified y e
2 456 2018-01-03 complete z f
3 789 2018-01-01 incomplete x g
3 789 2018-01-02 details_verified y h
3 789 2018-01-03 complete z i
我的努力失败
df %>% group_by(application_id, user_id) %>%
mutate(final_status = case_when(any(
application_status == "complete" & user_status_1 == "z" & user_status_2 == "c" ~ "good",
application_status == "complete" & user_status_1 == "z" & user_status_2 == "f" ~ "great",
application_status == "complete" & user_status_1 == "z" & user_status_2 == "i" ~ "excellent"
)))
所需结果*(水平滚动以查看所有列)*
application_id user_id date application_status user_status_1 user_status_2 final_status
1 123 2018-01-01 incomplete x a good
1 123 2018-01-02 details_verified y b good
1 123 2018-01-03 complete z c good
2 456 2018-01-01 incomplete x d great
2 456 2018-01-02 details_verified y e great
2 456 2018-01-03 complete z f great
3 789 2018-01-01 incomplete x g excellent
3 789 2018-01-02 details_verified y h excellent
3 789 2018-01-03 complete z i excellent
答案 0 :(得分:1)
您接近了–您只需要用any
包装每个逻辑语句即可。
df %>%
group_by(application_id, user_id) %>%
mutate(final_status = case_when(
any(application_status == "complete" & user_status_1 == "z" & user_status_2 == "c") ~ "good",
any(application_status == "complete" & user_status_1 == "z" & user_status_2 == "f") ~ "great",
any(application_status == "complete" & user_status_1 == "z" & user_status_2 == "i") ~ "excellent"
))
答案 1 :(得分:0)
这是通过首先创建named
向量的一个选项
library(data.table)
nm1 <- setNames(c('good', 'great', 'excellent'),
c('completezc', 'completezf', 'completezi'))
nm2 <- do.call(paste0, df[4:6])
setDT(df)[, final_status := nm1[nm2]][,
final_status := final_status[complete.cases(final_status)],
.(application_id, user_id)]
df
# application_id user_id date application_status user_status_1 user_status_2 final_status
#1: 1 123 2018-01-01 incomplete x a good
#2: 1 123 2018-01-02 details_verified y b good
#3: 1 123 2018-01-03 complete z c good
#4: 2 456 2018-01-01 incomplete x d great
#5: 2 456 2018-01-02 details_verified y e great
#6: 2 456 2018-01-03 complete z f great
#7: 3 789 2018-01-01 incomplete x g excellent
#8: 3 789 2018-01-02 details_verified y h excellent
#9: 3 789 2018-01-03 complete z i excellent
或者在tidyverse
library(tidyverse)
df %>%
unite(newcol, !!! rlang::syms(names(.)[4:6]), sep="") %>%
filter(str_detect(newcol, '^complete')) %>%
transmute(application_id, user_id, final_status = nm1[newcol]) %>%
right_join(df)