行方式最大值,从另一列返回值

时间:2017-05-25 17:30:24

标签: r indexing max row

在R中工作,我有一个包含3个日期列的表,对应3个指示符列。我需要提取与行中最近日期相关的指标。这是一个例子,a对应于x,b对应y和c对应z:

            a          b          c    x    y    z
1  2017-09-06       <NA> 2017-01-02    N <NA>    Y
2  2017-09-12 2017-03-24       <NA>    N    Y <NA>
3  2017-02-19 2017-10-28 2017-12-23    Y    N    Y

结果应该是:

1 N
2 N
3 Y

但是有一些问题:

  • 行中的任何或所有日期都可以是NA,相应的日期和指标都应该是NA。如果全部都是NA,那么返回NA就可以了。
  • 日期不一定按升序或降序排列

以下是一些将生成样本数据的代码。

a <- sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 30)
b <- sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 30)
c <- sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 30)

abc <- data.frame(a,b,c)

abc[sample(1:20,8),1] <- NA
abc[sample(1:20,8),2] <- NA
abc[sample(1:20,8),3] <- NA

x <- sample(c("N","Y"),30, replace = TRUE)
y <- sample(c("N","Y"),30, replace = TRUE)
z <- sample(c("N","Y"),30, replace = TRUE)

x[is.na(abc[,1])] <- NA
y[is.na(abc[,2])] <- NA
z[is.na(abc[,3])] <- NA

xyz <- data.frame(x,y,z)

sd <- data.frame(abc,xyz)

我知道我可以用ifelse做到这一点,但我猜有更好的方法。一如既往地感谢大家的帮助。

2 个答案:

答案 0 :(得分:0)

data.table解决方案:

library(data.table)
sd<-as.data.table(sd)
sd[,a_new:=as.numeric(gsub("-","",a))]
sd[,b_new:=as.numeric(gsub("-","",b))]
sd[,c_new:=as.numeric(gsub("-","",c))]

#Obtain needed index:
sd[,needed_index:=which.max(unlist(.SD)),.SDcols=c("a_new","b_new","c_new"),by=1:nrow(sd)]
sd[,final_result:=unlist(.SD)[needed_index],.SDcols=c("x","y","z"),by=1:nrow(sd)]

sd[,needed_index:=NULL]
sd[,a_new:=NULL]
sd[,b_new:=NULL]
sd[,c_new:=NULL]


sd
             a          b          c  x  y  z final_result
 1: 2017-01-30       <NA>       <NA>  Y NA NA            Y
 2:       <NA> 2017-01-23 2017-02-12 NA  N  N            N
 3:       <NA> 2017-01-29 2017-12-25 NA  N  N            N
 4: 2017-05-25 2017-03-18 2017-04-08  N  Y  N            N
 5:       <NA>       <NA> 2017-04-17 NA NA  N            N
 6: 2017-05-05 2018-01-01 2017-02-05  Y  Y  N            Y
 7:       <NA>       <NA> 2017-07-19 NA NA  N            N
 8: 2017-11-02 2017-12-31 2017-02-25  Y  N  N            N
 9: 2017-12-12       <NA> 2017-04-09  N NA  N            N
10:       <NA> 2017-01-02       <NA> NA  Y NA            Y
11:       <NA>       <NA>       <NA> NA NA NA           NA
12: 2017-08-28       <NA> 2017-03-14  N NA  Y            N
13: 2017-10-30       <NA>       <NA>  N NA NA            N
14:       <NA> 2017-03-30       <NA> NA  N NA            N
15: 2017-04-05 2017-12-01 2017-05-10  Y  Y  Y            Y
16:       <NA> 2017-03-13       <NA> NA  Y NA            Y
17: 2017-06-30 2017-05-09 2017-06-12  Y  N  Y            Y
18: 2017-09-14 2017-12-27       <NA>  N  Y NA            Y
19: 2017-04-09       <NA> 2017-11-16  Y NA  Y            Y
20: 2017-05-13 2017-07-28       <NA>  N  N NA            N
21: 2017-03-14 2017-06-25 2017-07-01  N  Y  N            N
22: 2017-08-22 2017-07-31 2017-08-24  Y  Y  N            N
23: 2017-06-07 2017-05-12 2017-11-08  N  N  N            N
24: 2017-09-27 2017-12-25 2017-06-20  Y  Y  N            Y
25: 2017-08-14 2017-03-20 2017-04-16  N  Y  N            N
26: 2017-12-23 2017-01-01 2017-06-25  Y  N  Y            Y
27: 2017-02-20 2017-02-09 2017-04-13  Y  Y  Y            Y
28: 2017-01-01 2017-02-14 2017-10-20  Y  Y  N            N
29: 2017-07-28 2017-01-16 2017-06-02  N  N  N            N
30: 2017-07-26 2017-09-25 2017-03-20  Y  Y  N            Y

答案 1 :(得分:0)

(用set.seed(123)生成样本数据)

# Make a matrix of dates
dates_mat <- sapply(sd[, c("a", "b", "c")], as.numeric)
dates_mat[is.na(dates_mat)] <- -Inf
# Retrieve index of max date
i_max <- max.col(dates_mat)
# Subset x, y, z with index
sd$ans <- as.matrix(sd[, c("x", "y", "z")])[cbind(seq_along(i_max), i_max)]
#             a          b          c x    y    z  ans
# 1  2017-04-16 2017-12-19 2017-09-01 N    N    Y    N
# 2  2017-10-15       <NA> 2017-02-04 Y <NA>    N    Y
# 3  2017-05-29       <NA>       <NA> Y <NA> <NA>    Y
# 4  2017-11-17 2017-10-16       <NA> N    Y <NA>    N
# 5        <NA> 2017-01-09 2017-10-22 N    Y    Y    Y
# 6        <NA>       <NA> 2017-06-11 Y <NA>    N    N
# 7        <NA>       <NA>       <NA> N <NA> <NA> <NA>
# 8  2017-12-29 2017-03-19 2017-12-26 Y    Y    Y    Y
# 9        <NA> 2017-04-24       <NA> Y    Y <NA>    Y
# 10       <NA> 2017-03-24       <NA> Y    Y <NA>    Y
# 11       <NA>       <NA> 2017-09-26 N <NA>    N    N
# 12 2017-06-10       <NA> 2017-08-12 N <NA>    Y    Y
# 13 2017-08-28       <NA>       <NA> Y <NA> <NA>    Y
# 14 2017-07-22 2017-05-11 2017-01-01 N    N    Y    N
# 15 2017-02-06 2017-02-23       <NA> N    N <NA>    N
# 16 2017-11-12 2017-02-18 2017-03-19 Y    N    N    Y
# 17       <NA> 2017-03-23 2017-05-13 Y    Y    Y    Y
# 18 2017-01-15 2017-06-12 2017-08-02 N    N    N    N
# 19       <NA> 2017-04-03 2017-05-03 Y    N    Y    Y
# 20 2017-11-28       <NA>       <NA> Y <NA> <NA>    Y
# 21 2017-11-04 2017-01-16 2017-03-26 Y    N    N    Y
# 22 2017-12-20 2017-06-02 2017-08-19 N    Y    Y    N
# 23 2017-08-09 2017-10-02 2017-05-24 Y    Y    Y    Y
# 24 2017-12-08 2017-02-11 2017-09-28 N    N    Y    N
# 25 2017-08-13 2017-07-11 2017-02-05 N    N    N    N
# 26 2017-08-30 2017-03-12 2017-05-29 N    N    N    N
# 27 2017-07-04 2017-02-13 2017-12-01 N    N    Y    Y
# 28 2017-07-21 2017-09-13 2017-10-30 Y    N    Y    Y
# 29 2017-04-08 2017-10-30 2017-10-27 N    Y    Y    Y
# 30 2017-02-19 2017-05-07 2017-02-28 N    N    N    N