将多列折叠为一列并生成索引变量

时间:2015-04-05 07:53:10

标签: r date

我有三个日期列,如下所示

       Id Date1       Date2         Date3
       12 2005-12-22  NA            NA
       11 2009-10-11  NA            NA
       29 NA          2005-04-11    NA
       45 NA          NA            2008-11-06
       39 NA          NA            2006-01-02
       44 NA          2005-04-16    NA

我正在尝试将三个Date列折叠到一个Date列中,如果Date1列中有Date值,则创建索引变量1;如果Date2列中有Date,则创建2;如果有Date,则创建3 Date3列

       Id Date        Index
       12 2005-12-22  1
       11 2009-10-11  1
       29 2005-04-11  2
       45 2008-11-06  3       
       39 2006-01-02  3
       44 2005-04-16  2

我可以使用大量的ifelse语句来做到这一点我想知道是否有人知道这样做的有效方法?

4 个答案:

答案 0 :(得分:5)

reshape来自"广泛"到"长"格式。如果d是您的data.frame:

d2 <- reshape(d, idvar = "Id", v.names = "Date", timevar = "Index",
              varying = c("Date1", "Date2", "Date3"), direction = "long")

结果:

> d2
     Id Index       Date
12.1 12     1 2005-12-22
11.1 11     1 2009-10-11
29.1 29     1       <NA>
45.1 45     1       <NA>
39.1 39     1       <NA>
44.1 44     1       <NA>
12.2 12     2       <NA>
11.2 11     2       <NA>
29.2 29     2 2005-04-11
45.2 45     2       <NA>
39.2 39     2       <NA>
44.2 44     2 2005-04-16
12.3 12     3       <NA>
11.3 11     3       <NA>
29.3 29     3       <NA>
45.3 45     3 2008-11-06
39.3 39     3 2006-01-02
44.3 44     3       <NA>

如果您不想要所有NA值(上图),您可以进行分组:

> d2[!is.na(d2$Date),]
     Id Index       Date
12.1 12     1 2005-12-22
11.1 11     1 2009-10-11
29.2 29     2 2005-04-11
44.2 44     2 2005-04-16
45.3 45     3 2008-11-06
39.3 39     3 2006-01-02

答案 1 :(得分:3)

您可以考虑melt您的数据。

以下是一个例子:

library(data.table)
library(reshape2)
melt(as.data.table(mydf), id.vars = "Id", na.rm = TRUE)
#    Id variable      value
# 1: 12    Date1 2005-12-22
# 2: 11    Date1 2009-10-11
# 3: 29    Date2 2005-04-11
# 4: 44    Date2 2005-04-16
# 5: 45    Date3 2008-11-06
# 6: 39    Date3 2006-01-02

## More specific to what you want:
melt(as.data.table(mydf), id.vars = "Id", na.rm = TRUE)[, 
  variable := sub("Date", "", variable)][]
#    Id variable      value
# 1: 12        1 2005-12-22
# 2: 11        1 2009-10-11
# 3: 29        2 2005-04-11
# 4: 44        2 2005-04-16
# 5: 45        3 2008-11-06
# 6: 39        3 2006-01-02

答案 2 :(得分:1)

您还可以tidyr使用id library(tidyr) df[is.na(df)]='' transform(unite(df, 'Date', Date1:Date3, sep=''), id=ceiling(which(df[-1]!='')/nrow(df))) # Id Date id #1 12 2005-12-22 1 #2 11 2009-10-11 1 #3 29 2005-04-11 2 #4 45 2008-11-06 2 #5 39 2006-01-02 3 #6 44 2005-04-16 3

{{1}}

答案 3 :(得分:1)

使用base R,我们可以获得&#39;日期&#39;的非NA值的列索引。矩阵乘法每行中的列

 indx <- (!is.na(df1[-1])) %*% seq_len(ncol(df1[-1]))

或在逻辑矩阵(max.col)上使用!is.na(df1[-1])

 indx <- max.col(!is.na(df1[-1]))

然后使用&#39; Id&#39;创建新的data.frame。来自&#39; df1&#39;,&#39;日期&#39;来自&#39; row / column&#39;索引和&#39;指数&#39;从上面。

 data.frame(Id=df1[1], Date=df1[-1][cbind(1:nrow(df1[-1]), indx)], Index=indx)
 #  Id       Date Index
 #1 12 2005-12-22     1
 #2 11 2009-10-11     1
 #3 29 2005-04-11     2
 #4 45 2008-11-06     3
 #5 39 2006-01-02     3
 #6 44 2005-04-16     2

或使用dplyr/tidyr

 library(dplyr)
 library(tidyr)
 gather(df1, Index, Date, -Id) %>% 
              filter(!is.na(Date)) %>% 
              extract(Index, 'Index', '[^0-9]+([0-9]+)', convert=TRUE)
 #  Id Index       Date
 #1 12     1 2005-12-22
 #2 11     1 2009-10-11
 #3 29     2 2005-04-11
 #4 44     2 2005-04-16
 #5 45     3 2008-11-06
 #6 39     3 2006-01-02

数据

df1 <- structure(list(Id = c(12L, 11L, 29L, 45L, 39L, 44L), 
Date1 = c("2005-12-22", 
"2009-10-11", NA, NA, NA, NA), Date2 = c(NA, NA, "2005-04-11", 
NA, NA, "2005-04-16"), Date3 = c(NA, NA, NA, "2008-11-06",
"2006-01-02", NA)), .Names = c("Id", "Date1", "Date2", "Date3"),
 class = "data.frame", row.names = c(NA, -6L))