我有一个类似以下的数据集:
data <- data.frame(ID = rep(1:5,each=4),
Event = rep(c("SCR","FUP","FUP","FUP"),5),
Date = c("2016-11-01", "2016-11-10", "2016-12-01", "2017-01-19",
"2017-04-12", "2017-04-04", "2017-05-30", "2017-05-25",
"2018-04-09", "2018-05-02", "2018-05-29", "2018-06-04",
"2017-06-06", "2017-07-26", "2017-09-07", "2017-09-15",
"2016-11-01", "2016-11-10", "2016-12-01", "2017-01-19"))
我想以某种方式使其看起来像这样:
ID SCR FUP_1 FUP_2 FUP_3
1 2016-11-01 2016-11-10 2016-12-01 2017-01-19
2 2017-04-12 2017-04-04 2017-05-30 2017-05-25
.
.
.
我尝试使用传播,但显示“错误:重复的标识符”。我也尝试过重塑:
reshape(data, idvar = "ID", timevar = "Event", direction = "wide", sep = "_")
但它会删除2个日期条目,并且仅采用第一个跟进日期(请参见下面的输出)
ID Date_SCR Date_FUP
1 2016-11-01 2016-11-10
2 2017-03-06 2017-04-12
3 2017-05-25 2017-01-19
4 2018-05-29 2018-06-04
5 2017-07-26 2017-09-07
有人可以帮我吗?预先感谢!
答案 0 :(得分:3)
要添加数字,我将使用make.unique
。这不是很漂亮,但是您以后可以随时对其进行重命名(或事先对其进行修复)。
首先,更改的数据:
data$Event <- ave(as.character(data$Event), data$ID, FUN=make.unique)
head(data)
# ID Event Date
# 1.1 1 SCR 2016-11-01
# 1.2 1 FUP 2016-11-10
# 1.3 1 FUP.1 2016-12-01
# 1.4 1 FUP.2 2017-01-19
# 2.5 2 SCR 2017-04-12
# 2.6 2 FUP 2017-04-04
以R为基础,列名难看,
reshape(data, idvar = "ID", v.names="Date", timevar="Event", direction="wide")
# ID Date.SCR Date.FUP Date.FUP.1 Date.FUP.2
# 1.1 1 2016-11-01 2016-11-10 2016-12-01 2017-01-19
# 2.5 2 2017-04-12 2017-04-04 2017-05-30 2017-05-25
# 3.9 3 2018-04-09 2018-05-02 2018-05-29 2018-06-04
# 4.13 4 2017-06-06 2017-07-26 2017-09-07 2017-09-15
# 5.17 5 2016-11-01 2016-11-10 2016-12-01 2017-01-19
Tidyverse
tidyr::spread(data, Event, Date)
# ID FUP FUP.1 FUP.2 SCR
# 1 1 2016-11-10 2016-12-01 2017-01-19 2016-11-01
# 2 2 2017-04-04 2017-05-30 2017-05-25 2017-04-12
# 3 3 2018-05-02 2018-05-29 2018-06-04 2018-04-09
# 4 4 2017-07-26 2017-09-07 2017-09-15 2017-06-06
# 5 5 2016-11-10 2016-12-01 2017-01-19 2016-11-01
data.table
data.table::dcast(data, ID ~ Event)
# Using 'Date' as value column. Use 'value.var' to override
# ID FUP FUP.1 FUP.2 SCR
# 1 1 2016-11-10 2016-12-01 2017-01-19 2016-11-01
# 2 2 2017-04-04 2017-05-30 2017-05-25 2017-04-12
# 3 3 2018-05-02 2018-05-29 2018-06-04 2018-04-09
# 4 4 2017-07-26 2017-09-07 2017-09-15 2017-06-06
# 5 5 2016-11-10 2016-12-01 2017-01-19 2016-11-01
答案 1 :(得分:1)
我并不是说这是“最佳”解决方案,但这会在Event值的末尾自动创建那些_num标记。
split(my_data, my_data$ID) %>%
lapply(function(.id){ group_by(.id, Event) %>%
mutate(new_event = paste0(Event, "_", row_number())) %>%
ungroup() }) %>%
purrr::reduce(rbind) %>%
dplyr::select(-Event) %>%
as.data.frame()