根据组内的特定行在组内更改

时间:2018-06-20 22:35:52

标签: r dplyr mutate

比方说,我有一个人X程序“长”格式的人程序开始和结束日期的数据框:

df <- data.frame(person.id = c(1,1,2,2,3,3),
             start.date = c("2015-01-01", "2015-01-05", "2016-05-06", "2015-04-01", "2015-07-01", "2015-01-06"),
             end.date = c("2015-01-30", "2015-02-05", "2016-06-23", "2015-05-30", "2015-08-10", "2015-02-05"),
             procedure = c("alpha", "beta", "alpha", "beta", "alpha", "beta"))

我如何在人员级别(即group_by(person.id)下)创建一个变量,该变量代表其“ alpha”过程的开始日期?我可以想到一些更长的解决方法,但是我想知道在group_by和mutate中是否有一种优雅的方法,例如:

df %<>%
  group_by(person.id) %>%
  mutate(alpha.start.date = #??)

谢谢!

1 个答案:

答案 0 :(得分:2)

我们可以通过获取与“ alpha”“过程”相对应的“ end.date”来使用mutate创建变量

library(dplyr)
df %>%
  group_by(person.id) %>% 
  mutate(alpha.start.date = end.date[procedure == "alpha"])
# A tibble: 6 x 5
# Groups:   person.id [3]
#  person.id start.date end.date   procedure alpha.start.date
#      <dbl> <fct>      <fct>      <fct>     <fct>           
#1         1 2015-01-01 2015-01-30 alpha     2015-01-30      
#2         1 2015-01-05 2015-02-05 beta      2015-01-30      
#3         2 2016-05-06 2016-06-23 alpha     2016-06-23      
#4         2 2015-04-01 2015-05-30 beta      2016-06-23      
#5         3 2015-07-01 2015-08-10 alpha     2015-08-10      
#6         3 2015-01-06 2015-02-05 beta      2015-08-10