重塑数据并修改新的列名

时间:2018-10-23 22:40:26

标签: r dataframe

我有一个类似以下的数据集:

data <- data.frame(ID    = rep(1:5,each=4), 
               Event = rep(c("SCR","FUP","FUP","FUP"),5), 
               Date  = c("2016-11-01", "2016-11-10", "2016-12-01", "2017-01-19", 
                         "2017-04-12", "2017-04-04", "2017-05-30", "2017-05-25", 
                         "2018-04-09", "2018-05-02", "2018-05-29", "2018-06-04", 
                         "2017-06-06", "2017-07-26", "2017-09-07", "2017-09-15", 
                         "2016-11-01", "2016-11-10", "2016-12-01", "2017-01-19"))

我想以某种方式使其看起来像这样:

ID    SCR         FUP_1        FUP_2        FUP_3
1     2016-11-01  2016-11-10   2016-12-01   2017-01-19
2     2017-04-12  2017-04-04   2017-05-30   2017-05-25
       .
       .
       .

我尝试使用传播,但显示“错误:重复的标识符”。我也尝试过重塑:

reshape(data, idvar = "ID", timevar = "Event", direction = "wide", sep = "_") 

但它会删除2个日期条目,并且仅采用第一个跟进日期(请参见下面的输出)

ID   Date_SCR    Date_FUP
1    2016-11-01  2016-11-10
2    2017-03-06  2017-04-12
3    2017-05-25  2017-01-19
4    2018-05-29  2018-06-04
5    2017-07-26  2017-09-07

有人可以帮我吗?预先感谢!

2 个答案:

答案 0 :(得分:3)

要添加数字,我将使用make.unique。这不是很漂亮,但是您以后可以随时对其进行重命名(或事先对其进行修复)。

首先,更改的数据:

data$Event <- ave(as.character(data$Event), data$ID, FUN=make.unique)
head(data)
#     ID Event       Date
# 1.1  1   SCR 2016-11-01
# 1.2  1   FUP 2016-11-10
# 1.3  1 FUP.1 2016-12-01
# 1.4  1 FUP.2 2017-01-19
# 2.5  2   SCR 2017-04-12
# 2.6  2   FUP 2017-04-04

以R为基础,列名难看,

reshape(data, idvar = "ID", v.names="Date", timevar="Event", direction="wide")
#      ID   Date.SCR   Date.FUP Date.FUP.1 Date.FUP.2
# 1.1   1 2016-11-01 2016-11-10 2016-12-01 2017-01-19
# 2.5   2 2017-04-12 2017-04-04 2017-05-30 2017-05-25
# 3.9   3 2018-04-09 2018-05-02 2018-05-29 2018-06-04
# 4.13  4 2017-06-06 2017-07-26 2017-09-07 2017-09-15
# 5.17  5 2016-11-01 2016-11-10 2016-12-01 2017-01-19

Tidyverse

tidyr::spread(data, Event, Date)
#   ID        FUP      FUP.1      FUP.2        SCR
# 1  1 2016-11-10 2016-12-01 2017-01-19 2016-11-01
# 2  2 2017-04-04 2017-05-30 2017-05-25 2017-04-12
# 3  3 2018-05-02 2018-05-29 2018-06-04 2018-04-09
# 4  4 2017-07-26 2017-09-07 2017-09-15 2017-06-06
# 5  5 2016-11-10 2016-12-01 2017-01-19 2016-11-01

data.table

data.table::dcast(data, ID ~ Event)
# Using 'Date' as value column. Use 'value.var' to override
#   ID        FUP      FUP.1      FUP.2        SCR
# 1  1 2016-11-10 2016-12-01 2017-01-19 2016-11-01
# 2  2 2017-04-04 2017-05-30 2017-05-25 2017-04-12
# 3  3 2018-05-02 2018-05-29 2018-06-04 2018-04-09
# 4  4 2017-07-26 2017-09-07 2017-09-15 2017-06-06
# 5  5 2016-11-10 2016-12-01 2017-01-19 2016-11-01

答案 1 :(得分:1)

我并不是说这是“最佳”解决方案,但这会在Event值的末尾自动创建那些_num标记。

split(my_data, my_data$ID) %>% 
lapply(function(.id){ group_by(.id, Event) %>% 
mutate(new_event = paste0(Event, "_", row_number())) %>%
  ungroup() }) %>%
purrr::reduce(rbind) %>%
dplyr::select(-Event) %>%
as.data.frame()