我试图返回另一个数据帧中两个日期之间一个数据帧的值之和。 Stack中提供的答案似乎不适用于我的应用程序。我尝试使用data.table但无济于事,所以去了。
MeanRemaining <- seq(as.Date("2017-01-01"),as.Date("2017-02-28"),2)
MeanRemaining<-as.data.frame(cbind(MeanRemaining,lag(MeanRemaining)))
colnames(MeanRemaining)<-c("InspDate", "PrevInspDate")
MeanRemaining$InspDate<-as.Date(MeanRemaining$InspDate, origin = "1970/01/01")
MeanRemaining$PrevInspDate<-as.Date(MeanRemaining$PrevInspDate, origin = "1970/01/01")
重要的是,日期范围实际上并没有像上面那样固定,并且可能是相隔大约一周的任何范围。
DailyTonnes <- as.data.frame(cbind(as.data.frame(seq(as.Date
+ ("2016-12-01"),as.Date("2017-03-28"),1)),(replicate(1,sample(abs(rnorm(118))*1000,rep=TRUE)))))
colnames(DailyTonnes)<-c("date","Vol")
我想对“ MeanRemaining”中每个日期范围之间的“ DailyTonnes”中的“ Vol”求和,并将总“ Vol”返回到“ MeanRemaining”中的相应行。
在我尝试过的类似问题的帮助下
library(data.table)
setDT(MeanRemaining)
setDT(DailyTonnes)
MeanRemaining[DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date <= PrevInspDate),
by = .EACHI], TotalVol := V1, on = .(InspDate=date)]
但是这会返回NA值。
任何建议将不胜感激。
答案 0 :(得分:1)
我相信您的问题包含了答案所需的所有内容。
我稍微完善了您的代码并更改了最后一行(这是唯一的错误代码)。最后一行的连接过于复杂,我认为它不会带来任何内存/性能提升。
library(data.table)
# Create MeanRemaining
MeanRemaining <-
data.table(InspDate = seq(as.Date("2017-01-01"), as.Date("2017-02-28"), 2))
# I changed lag by shift, I think it is clearer this way
MeanRemaining[, PrevInspDate := shift(InspDate, type = "lead", fill = 1000000L)]
# set seed for repetibility
set.seed(13)
# Create DailyTonnes, I changed the end date to generate empty intervals
DailyTonnes <- data.table(date = seq(as.Date("2016-12-01"),
as.Date("2017-01-28"), 1),
Vol = sample(abs(rnorm(118)) * 1000, rep = TRUE))
# I changed the <= condition to <, I think it fits PrevInspDate better
# This should be your final result if I'm not wrong
SingleCase <-
DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
# SingleCase has two variables called date, this may be a small bug in data.table
print(names(SingleCase))
# change the names of the data.table to suit your needs
names(SingleCase) <- c("InspDate", "PrevInspDate", "TotalVol")
从MeanRemaining检索多个变量的情况非常棘手。少量变量很容易解决:
# Add variables to MeanRemaining
for (i in 1:100) {
MeanRemaining[, paste0("extracol", i) := sample(.N)]
}
# Two variable case
smallmultiple <-
DailyTonnes[MeanRemaining, list(TotalVol = sum(Vol),
extracol1 = i.extracol1 ,
extracol2 = i.extracol2), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
# Correct date names
names(smallmultiple)[1:2] <- c("InspDate", "PrevInspDate")
涉及很多变量时,它变得很难。有this feature request in github个可以解决您的问题,但目前不可用。 This question面临类似的问题,但不能用于您的情况。
处理大量变量的方法是:
# obtain names of variables to be kept in the later join
joinkeepcols <-
setdiff(names(MeanRemaining), c("InspDate", "PrevInspDate"))
# the "i" indicates the table to take the variables from
joinkeepcols2 <- paste0("i.", joinkeepcols)
# Prepare a expression for the data.table environment
keepcols <-
paste(paste(joinkeepcols, joinkeepcols2, sep = " = "), collapse = ", ")
# Complete expression to be evaluated in data.table
evalexpression <- paste0("list(
TotalVol = sum(Vol),",
keepcols, ")")
# The magic comes here (you can assign it to MeanRemaining)
bigmultiple <-
DailyTonnes[MeanRemaining, eval(parse(text = evalexpression)), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
# Correct date names
names(bigmultiple)[1:2] <- c("InspDate", "PrevInspDate")