在R中工作,我有一个包含3个日期列的表,对应3个指示符列。我需要提取与行中最近日期相关的指标。这是一个例子,a对应于x,b对应y和c对应z:
a b c x y z
1 2017-09-06 <NA> 2017-01-02 N <NA> Y
2 2017-09-12 2017-03-24 <NA> N Y <NA>
3 2017-02-19 2017-10-28 2017-12-23 Y N Y
结果应该是:
1 N
2 N
3 Y
但是有一些问题:
以下是一些将生成样本数据的代码。
a <- sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 30)
b <- sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 30)
c <- sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 30)
abc <- data.frame(a,b,c)
abc[sample(1:20,8),1] <- NA
abc[sample(1:20,8),2] <- NA
abc[sample(1:20,8),3] <- NA
x <- sample(c("N","Y"),30, replace = TRUE)
y <- sample(c("N","Y"),30, replace = TRUE)
z <- sample(c("N","Y"),30, replace = TRUE)
x[is.na(abc[,1])] <- NA
y[is.na(abc[,2])] <- NA
z[is.na(abc[,3])] <- NA
xyz <- data.frame(x,y,z)
sd <- data.frame(abc,xyz)
我知道我可以用ifelse做到这一点,但我猜有更好的方法。一如既往地感谢大家的帮助。
答案 0 :(得分:0)
data.table解决方案:
library(data.table)
sd<-as.data.table(sd)
sd[,a_new:=as.numeric(gsub("-","",a))]
sd[,b_new:=as.numeric(gsub("-","",b))]
sd[,c_new:=as.numeric(gsub("-","",c))]
#Obtain needed index:
sd[,needed_index:=which.max(unlist(.SD)),.SDcols=c("a_new","b_new","c_new"),by=1:nrow(sd)]
sd[,final_result:=unlist(.SD)[needed_index],.SDcols=c("x","y","z"),by=1:nrow(sd)]
sd[,needed_index:=NULL]
sd[,a_new:=NULL]
sd[,b_new:=NULL]
sd[,c_new:=NULL]
sd
a b c x y z final_result
1: 2017-01-30 <NA> <NA> Y NA NA Y
2: <NA> 2017-01-23 2017-02-12 NA N N N
3: <NA> 2017-01-29 2017-12-25 NA N N N
4: 2017-05-25 2017-03-18 2017-04-08 N Y N N
5: <NA> <NA> 2017-04-17 NA NA N N
6: 2017-05-05 2018-01-01 2017-02-05 Y Y N Y
7: <NA> <NA> 2017-07-19 NA NA N N
8: 2017-11-02 2017-12-31 2017-02-25 Y N N N
9: 2017-12-12 <NA> 2017-04-09 N NA N N
10: <NA> 2017-01-02 <NA> NA Y NA Y
11: <NA> <NA> <NA> NA NA NA NA
12: 2017-08-28 <NA> 2017-03-14 N NA Y N
13: 2017-10-30 <NA> <NA> N NA NA N
14: <NA> 2017-03-30 <NA> NA N NA N
15: 2017-04-05 2017-12-01 2017-05-10 Y Y Y Y
16: <NA> 2017-03-13 <NA> NA Y NA Y
17: 2017-06-30 2017-05-09 2017-06-12 Y N Y Y
18: 2017-09-14 2017-12-27 <NA> N Y NA Y
19: 2017-04-09 <NA> 2017-11-16 Y NA Y Y
20: 2017-05-13 2017-07-28 <NA> N N NA N
21: 2017-03-14 2017-06-25 2017-07-01 N Y N N
22: 2017-08-22 2017-07-31 2017-08-24 Y Y N N
23: 2017-06-07 2017-05-12 2017-11-08 N N N N
24: 2017-09-27 2017-12-25 2017-06-20 Y Y N Y
25: 2017-08-14 2017-03-20 2017-04-16 N Y N N
26: 2017-12-23 2017-01-01 2017-06-25 Y N Y Y
27: 2017-02-20 2017-02-09 2017-04-13 Y Y Y Y
28: 2017-01-01 2017-02-14 2017-10-20 Y Y N N
29: 2017-07-28 2017-01-16 2017-06-02 N N N N
30: 2017-07-26 2017-09-25 2017-03-20 Y Y N Y
答案 1 :(得分:0)
(用set.seed(123)
生成样本数据)
# Make a matrix of dates
dates_mat <- sapply(sd[, c("a", "b", "c")], as.numeric)
dates_mat[is.na(dates_mat)] <- -Inf
# Retrieve index of max date
i_max <- max.col(dates_mat)
# Subset x, y, z with index
sd$ans <- as.matrix(sd[, c("x", "y", "z")])[cbind(seq_along(i_max), i_max)]
# a b c x y z ans
# 1 2017-04-16 2017-12-19 2017-09-01 N N Y N
# 2 2017-10-15 <NA> 2017-02-04 Y <NA> N Y
# 3 2017-05-29 <NA> <NA> Y <NA> <NA> Y
# 4 2017-11-17 2017-10-16 <NA> N Y <NA> N
# 5 <NA> 2017-01-09 2017-10-22 N Y Y Y
# 6 <NA> <NA> 2017-06-11 Y <NA> N N
# 7 <NA> <NA> <NA> N <NA> <NA> <NA>
# 8 2017-12-29 2017-03-19 2017-12-26 Y Y Y Y
# 9 <NA> 2017-04-24 <NA> Y Y <NA> Y
# 10 <NA> 2017-03-24 <NA> Y Y <NA> Y
# 11 <NA> <NA> 2017-09-26 N <NA> N N
# 12 2017-06-10 <NA> 2017-08-12 N <NA> Y Y
# 13 2017-08-28 <NA> <NA> Y <NA> <NA> Y
# 14 2017-07-22 2017-05-11 2017-01-01 N N Y N
# 15 2017-02-06 2017-02-23 <NA> N N <NA> N
# 16 2017-11-12 2017-02-18 2017-03-19 Y N N Y
# 17 <NA> 2017-03-23 2017-05-13 Y Y Y Y
# 18 2017-01-15 2017-06-12 2017-08-02 N N N N
# 19 <NA> 2017-04-03 2017-05-03 Y N Y Y
# 20 2017-11-28 <NA> <NA> Y <NA> <NA> Y
# 21 2017-11-04 2017-01-16 2017-03-26 Y N N Y
# 22 2017-12-20 2017-06-02 2017-08-19 N Y Y N
# 23 2017-08-09 2017-10-02 2017-05-24 Y Y Y Y
# 24 2017-12-08 2017-02-11 2017-09-28 N N Y N
# 25 2017-08-13 2017-07-11 2017-02-05 N N N N
# 26 2017-08-30 2017-03-12 2017-05-29 N N N N
# 27 2017-07-04 2017-02-13 2017-12-01 N N Y Y
# 28 2017-07-21 2017-09-13 2017-10-30 Y N Y Y
# 29 2017-04-08 2017-10-30 2017-10-27 N Y Y Y
# 30 2017-02-19 2017-05-07 2017-02-28 N N N N