我有以下数据框
filenumber<-c('510-1','510-1','510-2','510-3','510-3')
Year<-c('2017','2018','2018','2018','2019')
outcome<-c('Accepted',"Completed","Accepted","Accepted","Completed")
df<-data.frame(filenumber,Year,outcome)
我想确保如果给定filenumber
中的Accepted
是Year
,我将与该Year
关联的所有文件都命名为“ cohort”,后跟被接受的年份
df%>%group_by(filenumber)%>%mutate(cohort=case_when(Year=='2017' & outcome=='Accepted'~'cohort-2017',
Year=='2018' & outcome=='Accepted'~'cohort-2018'))
filenumber Year outcome cohort
510-1 2017 Accepted cohort-2017
510-1 2018 Completed NA
510-2 2018 Accepted cohort-2018
510-3 2018 Accepted cohort-2018
510-3 2019 Completed NA
但是,我想确保该同类群组适用于以Accepted
作为结果的文件号,以便我可以这样做
filenumber Year outcome cohort
510-1 2017 Accepted cohort-2017
510-1 2018 Completed cohort-2017
510-2 2018 Accepted cohort-2018
510-3 2018 Accepted cohort-2018
510-3 2019 Completed cohort-2018
我该怎么做
答案 0 :(得分:0)
我们可以从fill
开始使用tidyr
library(dplyr)
library(tidyr)
df%>%
group_by(filenumber)%>%mutate(cohort=case_when(Year=='2017' &
outcome=='Accepted'~'cohort-2017',
Year=='2018' & outcome=='Accepted'~'cohort-2018')) %>%
fill(cohort)
# A tibble: 5 x 4
# Groups: filenumber [3]
# filenumber Year outcome cohort
# <fct> <fct> <fct> <chr>
#1 510-1 2017 Accepted cohort-2017
#2 510-1 2018 Completed cohort-2017
#3 510-2 2018 Accepted cohort-2018
#4 510-3 2018 Accepted cohort-2018
#5 510-3 2019 Completed cohort-2018
它也可以简化。在按“文件编号”分组之后,match
在“结果”上的“接受”字符串以获取数字索引,基于该索引,将“年份”和paste
的子集“ cohort-”字符串作为子集创建“同类群组”列
library(stringr)
df %>%
group_by(filenumber) %>%
mutate(cohort = str_c('cohort-', Year[match('Accepted', outcome)]))