我有三个日期列,如下所示
Id Date1 Date2 Date3
12 2005-12-22 NA NA
11 2009-10-11 NA NA
29 NA 2005-04-11 NA
45 NA NA 2008-11-06
39 NA NA 2006-01-02
44 NA 2005-04-16 NA
我正在尝试将三个Date列折叠到一个Date列中,如果Date1列中有Date值,则创建索引变量1;如果Date2列中有Date,则创建2;如果有Date,则创建3 Date3列
Id Date Index
12 2005-12-22 1
11 2009-10-11 1
29 2005-04-11 2
45 2008-11-06 3
39 2006-01-02 3
44 2005-04-16 2
我可以使用大量的ifelse语句来做到这一点我想知道是否有人知道这样做的有效方法?
答案 0 :(得分:5)
reshape
来自"广泛"到"长"格式。如果d
是您的data.frame:
d2 <- reshape(d, idvar = "Id", v.names = "Date", timevar = "Index",
varying = c("Date1", "Date2", "Date3"), direction = "long")
结果:
> d2
Id Index Date
12.1 12 1 2005-12-22
11.1 11 1 2009-10-11
29.1 29 1 <NA>
45.1 45 1 <NA>
39.1 39 1 <NA>
44.1 44 1 <NA>
12.2 12 2 <NA>
11.2 11 2 <NA>
29.2 29 2 2005-04-11
45.2 45 2 <NA>
39.2 39 2 <NA>
44.2 44 2 2005-04-16
12.3 12 3 <NA>
11.3 11 3 <NA>
29.3 29 3 <NA>
45.3 45 3 2008-11-06
39.3 39 3 2006-01-02
44.3 44 3 <NA>
如果您不想要所有NA
值(上图),您可以进行分组:
> d2[!is.na(d2$Date),]
Id Index Date
12.1 12 1 2005-12-22
11.1 11 1 2009-10-11
29.2 29 2 2005-04-11
44.2 44 2 2005-04-16
45.3 45 3 2008-11-06
39.3 39 3 2006-01-02
答案 1 :(得分:3)
您可以考虑melt
您的数据。
以下是一个例子:
library(data.table)
library(reshape2)
melt(as.data.table(mydf), id.vars = "Id", na.rm = TRUE)
# Id variable value
# 1: 12 Date1 2005-12-22
# 2: 11 Date1 2009-10-11
# 3: 29 Date2 2005-04-11
# 4: 44 Date2 2005-04-16
# 5: 45 Date3 2008-11-06
# 6: 39 Date3 2006-01-02
## More specific to what you want:
melt(as.data.table(mydf), id.vars = "Id", na.rm = TRUE)[,
variable := sub("Date", "", variable)][]
# Id variable value
# 1: 12 1 2005-12-22
# 2: 11 1 2009-10-11
# 3: 29 2 2005-04-11
# 4: 44 2 2005-04-16
# 5: 45 3 2008-11-06
# 6: 39 3 2006-01-02
答案 2 :(得分:1)
您还可以tidyr
使用id
library(tidyr)
df[is.na(df)]=''
transform(unite(df, 'Date', Date1:Date3, sep=''),
id=ceiling(which(df[-1]!='')/nrow(df)))
# Id Date id
#1 12 2005-12-22 1
#2 11 2009-10-11 1
#3 29 2005-04-11 2
#4 45 2008-11-06 2
#5 39 2006-01-02 3
#6 44 2005-04-16 3
:
{{1}}
答案 3 :(得分:1)
使用base R
,我们可以获得&#39;日期&#39;的非NA值的列索引。矩阵乘法每行中的列
indx <- (!is.na(df1[-1])) %*% seq_len(ncol(df1[-1]))
或在逻辑矩阵(max.col
)上使用!is.na(df1[-1])
indx <- max.col(!is.na(df1[-1]))
然后使用&#39; Id&#39;创建新的data.frame。来自&#39; df1&#39;,&#39;日期&#39;来自&#39; row / column&#39;索引和&#39;指数&#39;从上面。
data.frame(Id=df1[1], Date=df1[-1][cbind(1:nrow(df1[-1]), indx)], Index=indx)
# Id Date Index
#1 12 2005-12-22 1
#2 11 2009-10-11 1
#3 29 2005-04-11 2
#4 45 2008-11-06 3
#5 39 2006-01-02 3
#6 44 2005-04-16 2
或使用dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df1, Index, Date, -Id) %>%
filter(!is.na(Date)) %>%
extract(Index, 'Index', '[^0-9]+([0-9]+)', convert=TRUE)
# Id Index Date
#1 12 1 2005-12-22
#2 11 1 2009-10-11
#3 29 2 2005-04-11
#4 44 2 2005-04-16
#5 45 3 2008-11-06
#6 39 3 2006-01-02
df1 <- structure(list(Id = c(12L, 11L, 29L, 45L, 39L, 44L),
Date1 = c("2005-12-22",
"2009-10-11", NA, NA, NA, NA), Date2 = c(NA, NA, "2005-04-11",
NA, NA, "2005-04-16"), Date3 = c(NA, NA, NA, "2008-11-06",
"2006-01-02", NA)), .Names = c("Id", "Date1", "Date2", "Date3"),
class = "data.frame", row.names = c(NA, -6L))