如何在R中的两个日期之间进行汇总?

时间:2018-01-02 07:04:42

标签: r date aggregate

以下是两个表

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

我想以这种方式显示结果

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

我使用了命令

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

我希望数量在第一个表格中的给定日期之间汇总,即在第一个和第二个日期之间,第二个和第三个日期之间等等。

提前致谢!

3 个答案:

答案 0 :(得分:3)

我们可以与data.table

进行联接
library(data.table)
res <- setDT(df1)[df2, roll = -Inf, on = .(Date)][, .(Qty = sum(Qty)),
           .(OldPrice, NewPrice)][df1, on = .(OldPrice, NewPrice)][is.na(Qty), Qty := 0]
setcolorder(res, c(names(df1), "Qty"))
res
#                   Date OldPrice NewPrice Qty
#1: 2014-06-12 09:32:56        0       10   0
#2: 2014-06-27 16:13:36       10       12  12
#3: 2014-08-12 22:41:47       12       13   9

答案 1 :(得分:0)

dplyrtidyr

有点冗长的想法
library(dplyr)
library(tidyr)

full_join(Table1, Table2, by = "Date") %>% 
  arrange(Date) %>% 
  fill(OldPrice, NewPrice, .direction = "up") %>% 
  group_by(OldPrice, NewPrice) %>% 
  summarize(Qty = sum(Qty, na.rm = TRUE)) %>% 
  ungroup() %>% 
  select(Qty) %>% 
  bind_cols(Table1, .)

#                  Date OldPrice NewPrice Qty
# 1 2014-06-12 09:32:56        0       10   0
# 2 2014-06-27 16:13:36       10       12  12
# 3 2014-08-12 22:41:47       12       13   9

答案 2 :(得分:0)

你开始使用for循环因此你可以执行以下for循环方式:

df1 <- read.table(text=
"'Date'                   'OldPrice'   'NewPrice'
'2014-06-12 09:32:56'     '0'          '10'
'2014-06-27 16:13:36'     '10'         '12'
'2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
"'Date'                  'Qty'
'2014-06-15 18:09:23'     '5'
'2014-06-19 12:04:29'     '4'
'2014-06-22 13:21:34'     '3'
'2014-06-29 19:01:22'     '6'
'2014-07-01 18:02:33'     '3'" , stringsAsFactors=F, header=T)

df1 <- df1[with(df1, order(Date)),] #order df1 by Date
df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
values <- vector("list", length = nrow(df1)) #declare a list of specific length of df1

for(i in 1:nrow(df1)){
  for(j in 1:nrow(df2)){
  if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
    values[[i]] <- append(values[[i]], df2$Qty[j])
  }
  }
}

df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13        9

显然,更多的工作,但如果你被困在for循环中它会有所帮助。