我有一个包含3列的输入表(Person_Id,Visit_Id(每次访问和每个人的唯一ID)和目的),如下所示。我想生成另一个新列,该列提供该人的前一次就诊(例如:如果某人以Visit Id = 2来医院就诊,那么我想再建一个列“ Preceding_visit_Id”,该列将为1(例如: 2,如果访问ID = 5,则先前的访问ID将为4)。是否可以使用mutate函数以一种优雅的方式做到这一点?
输入表
输出表
请注意,这是对大型程序中某一列的转换,因此,任何精巧的方法都将有所帮助。
Dput命令输出在这里
structure(list(Person_Id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3), Visit_Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14), Purpose = c("checkup", "checkup", "checkup", "checkup",
"checkup", "checkup", "checkup", "checkup", "checkup", "checkup",
"checkup", "checkup", "checkup", "checkup"), Preceding_visit_id = c(NA,
1, 2, 3, 4, NA, 6, 7, 8, 9, 10, NA, 12, 12)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec =
structure(list(
cols = list(Person_Id = structure(list(), class = c("collector_double",
"collector")), Visit_Id = structure(list(), class = c("collector_double",
"collector")), Purpose = structure(list(), class =
c("collector_character",
"collector")), Preceding_visit_id = structure(list(), class =
c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))'''
答案 0 :(得分:1)
示例中的Person_Id
字段不匹配。
我不确定这是否是您需要的,但是我已经从您的dput()
创建了一个删除最后一列的文件:
df_input <- df_output %>%
select(-Preceding_visit_id)
然后执行以下操作:
df_input %>%
group_by(Person_Id) %>%
mutate(Preceding_visit_id = lag(Visit_Id))
输出为:
# A tibble: 14 x 4
# Groups: Person_Id [3]
Person_Id Visit_Id Purpose Preceding_visit_id
<dbl> <dbl> <chr> <dbl>
1 1 1 checkup NA
2 1 2 checkup 1
3 1 3 checkup 2
4 1 4 checkup 3
5 1 5 checkup 4
6 2 6 checkup NA
7 2 7 checkup 6
8 2 8 checkup 7
9 2 9 checkup 8
10 2 10 checkup 9
11 2 11 checkup 10
12 3 12 checkup NA
13 3 13 checkup 12
14 3 14 checkup 13