如何在R中的两个日期之间进行汇总?

时间:2018-01-03 06:33:59

标签: r date aggregate

以下是两个表

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

我想以这种方式显示结果

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

我使用了命令

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

我希望在第一个表中的给定日期之间汇总数量,即在第一个和第二个日期之间,第二个和第三个日期之间,等等。此外,最后一个日期需要汇总到另一个表中的日期结束。

提前致谢!

1 个答案:

答案 0 :(得分:1)

与您的previous one

相比,我明白您在这个问题中的意思

这为您提供了您提供的确切输出:

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13       15

由以下代码生成(参见说明):

#your data & preps
df1 <- read.table(text=
                    "'Date'                   'OldPrice'   'NewPrice'
                  '2014-06-12 09:32:56'     '0'          '10'
                  '2014-06-27 16:13:36'     '10'         '12'
                  '2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
                    "'Date'                  'Qty'
                  '2014-06-15 18:09:23'     '5'
                  '2014-06-19 12:04:29'     '4'
                  '2014-06-22 13:21:34'     '3'
                  '2014-06-29 19:01:22'     '6'
                  '2014-07-01 18:02:33'     '3'
                  '2014-09-29 22:41:47'     '6'" , stringsAsFactors=F, header=T)

df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
df1 <- df1[with(df1, order(Date)),] #order df1 by Date
values <- vector("list", length = nrow(df1)+1) #declare a list of specific length of df1+1
out_of_time_dates_before <- c(); out_of_time_dates_after <- c() #here will be dates that come before or after dates available in df1
names(values) <- c(1:(length(values)-2), "out_of_time_dates_before", "out_of_time_dates_after")

#producing the main outputs
for(j in 1:nrow(df2)){
  print(paste0("Being processed: ", df2$Date[j]))
  for(i in 1:(nrow(df1)-1)){
    if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
      values[[i]] <- append(values[[i]], df2$Qty[j])
    } 
  } 
  if(df2$Date[j]<min(df1$Date)){
    out_of_time_dates_before <- append(out_of_time_dates_before, df2$Qty[j])
    values[["out_of_time_dates_before"]] <- append(values[["out_of_time_dates_before"]], df2$Qty[j])
  } else if(df2$Date[j] > max(df1$Date)){
    out_of_time_dates_after <- append(out_of_time_dates_after, df2$Qty[j])
    values[["out_of_time_dates_after"]] <- append(values[["out_of_time_dates_after"]], df2$Qty[j])
  }
}

#aggregating the quantity for the date ranges and all that falls before or after the date ranges not available in df1   
df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)
df1$Quantity[1] <- df1$Quantity[1]+sapply(values, sum)["out_of_time_dates_before"]
df1$Quantity[length(df1$Quantity)] <- df1$Quantity[length(df1$Quantity)]+sapply(values, sum)["out_of_time_dates_after"]

我认为你有一些有趣的问题需要解决一下将其传达给SO社区的不幸方式。例如,不清楚如何处理df2中df1中任何可用范围之前的日期,因此在上面的代码中,我将这些数量(如果有的话)添加到第一个日期。这更通用,类似于您对df2中日期在df1中的日期范围之后的预期(将它们添加到df1的最后日期)。