我有一个如下所示的数据框:
id date1 value1 date2 value2 date3 value3
1 1113 2012-01-14 29 2012-09-29 22 2013-10-28 21
2 1622 2012-12-05 93 2012-12-05 82 2013-01-22 26
3 1609 2014-08-30 30 2013-04-07 53 2013-03-20 100
4 1624 2014-01-20 84 2013-03-17 92 2014-01-10 81
5 1861 2014-10-08 29 2012-08-19 84 2012-09-21 56
6 1640 2014-03-05 27 2012-02-28 5 2015-01-11 65
我想创建一个新列,其中包含最近的三列“value1”,“value2”和“value3”中的任何一个值。我不需要知道它与哪个日期相关联。
id date1 value1 date2 value2 date3 value3 value_recent
1 1113 2012-01-14 29 2012-09-29 22 2013-10-28 21 21
2 1622 2012-12-05 93 2012-12-05 82 2013-01-22 26 26
3 1609 2014-08-30 30 2013-04-07 53 2013-03-20 100 30
4 1624 2014-01-20 84 2013-03-17 92 2014-01-10 81 84
5 1861 2014-10-08 29 2012-08-19 84 2012-09-21 56 29
6 1640 2014-03-05 27 2012-02-28 5 2015-01-11 65 65
创建工作示例的代码:
set.seed(1234)
id <- sample(1000:2000, 6, replace=TRUE)
date1 <- sample(seq(as.Date('2012-01-01'), as.Date('2016-01-01'), by="day"), 6)
value1 <- sample(1:100, 6, replace=TRUE)
date2 <- sample(seq(as.Date('2012-01-01'), as.Date('2016-01-01'), by="day"), 6)
value2 <- sample(1:100, 6, replace=TRUE)
date3 <- sample(seq(as.Date('2012-01-01'), as.Date('2016-01-01'), by="day"), 6)
value3 <- sample(1:100, 6, replace=TRUE)
df <- data.frame(id, date1, value1, date2, value2, date3, value3)
答案 0 :(得分:1)
编辑:根据@Pierre Lafortune的回答,您实际上可以将其合并为一个陈述。
编辑2:在数据中添加了NA,也更改了代码以处理NAs。
这应该相当不错。它确实需要一个循环,我有兴趣看看是否有人可以提出一个简洁的vecotrized解决方案。
date_cols <- colnames(df)[grep("date",colnames(df))]
df$value_recent<-df[cbind(1:nrow(df),grep("date",colnames(df))[apply(sapply(df[,date_cols],as.numeric),1,which.max)]+1)]
df
id date1 value1 date2 value2 date3 value3 value_recent
1 1113 <NA> 29 2012-09-29 22 2013-10-28 21 21
2 1622 2012-12-05 93 2012-12-05 82 2013-01-22 26 26
3 1609 <NA> 30 2013-04-07 53 2013-03-20 100 53
4 1624 2014-01-20 84 2013-03-17 92 2014-01-10 81 84
5 1861 2014-10-08 29 2012-08-19 84 2012-09-21 56 29
6 1640 2014-03-05 27 2012-02-28 5 2015-01-11 65 65
数据:强>
df<-structure(list(id = c(1113L, 1622L, 1609L, 1624L, 1861L, 1640L
), date1 = structure(c(NA, 15679, NA, 16090, 16351, 16134), class = "Date"),
value1 = c(29L, 93L, 30L, 84L, 29L, 27L), date2 = structure(c(15612,
15679, 15802, 15781, 15571, 15398), class = "Date"), value2 = c(22L,
82L, 53L, 92L, 84L, 5L), date3 = structure(c(16006, 15727,
15784, 16080, 15604, 16446), class = "Date"), value3 = c(21L,
26L, 100L, 81L, 56L, 65L)), .Names = c("id", "date1", "value1",
"date2", "value2", "date3", "value3"), row.names = c(NA, -6L), class = "data.frame")
答案 1 :(得分:1)
我正在使用apply
查看查找最近日期的行。然后使用该索引查找对应的值。我们使用矩阵子集方法来保持简洁:
indx <- apply(df[grep("date", names(df))], 1, function(x) which(x == max(x))[1])
df$value_recent <- df[grep("val", names(df))][cbind(1:nrow(df), indx)]
# id date1 value1 date2 value2 date3 value3 value_recent
# 1 1113 2012-01-14 29 2012-09-29 22 2013-10-28 21 21
# 2 1622 2012-12-05 93 2012-12-05 82 2013-01-22 26 26
# 3 1609 2014-08-30 30 2013-04-07 53 2013-03-20 100 30
# 4 1624 2014-01-20 84 2013-03-17 92 2014-01-10 81 84
# 5 1861 2014-10-08 29 2012-08-19 84 2012-09-21 56 29
# 6 1640 2014-03-05 27 2012-02-28 5 2015-01-11 65 65
(注意:以这种方式安排数据会带来更多麻烦。)
答案 2 :(得分:0)
可能没有那么冗长的方法,但这是一个选项。首先将其移至长期&#34;格式,然后通过id拆分,排序,并提取最新的记录,并将其与原始数据框合并。
ld <- reshape(df,
idvar = "id",
varying = list(paste0("date", 1:3),
paste0("value", 1:3)),
v.names = c("date", "value"),
direction = "long")
recent <- split(ld, ld$id)
recent <- lapply(recent, function(x) {
d <- x[order(x$date), ]
d <- d[nrow(d), c(1, 4)]
names(d)[2] <- "value_recent"
d
})
recent <- do.call(rbind, recent)
merge(df, recent, by = "id")
# id date1 value1 date2 value2 date3 value3 value_recent
# 1 1204 2014-10-25 73 2012-12-22 39 2015-07-18 62 62
# 2 1667 2012-01-16 97 2014-02-28 30 2014-12-31 83 83
# 3 1673 2015-01-16 96 2014-12-16 50 2014-08-05 31 96
# 4 1722 2015-02-07 10 2013-12-25 4 2012-08-18 93 10
# 5 1882 2012-10-20 91 2014-12-28 71 2015-09-03 18 18
# 6 1883 2012-03-30 73 2015-04-26 4 2014-12-23 74 4
答案 3 :(得分:0)
这是一个类似的解决方案,也以x= np.array(range(2002,2012))
pfarray = []
rrarrays = []
for yr in range(len(band3)):
b3 = gdalnumeric.LoadFile(band3[yr])
b4 = gdalnumeric.LoadFile(band4[yr])
ndvi = (b4 - b3)/(b4 + b3)
ndvihealthy = ndvi[baarray==2]
rr = ndvi/ndvihealthy.mean()
rrarrays.append(rr)
rr3d = np.dstack(rrarrays)
for pixel in rr:
pf = np.polyfit(x,rr,1)
pfarray.append(pf)
开头,但其余部分在一系列管道中完成:
reshape