R按时间标准汇总

时间:2016-03-22 03:02:43

标签: r

我有一个如下所示的数据集(称为戳记数据):

Date_Time            Cost   
---------           -----  
01/02/2015 01:52 PM    6     
01/02/2015 02:22 PM    2    
01/03/2015 02:42 PM    50   
01/04/2015 03:01 PM    25 

和不同的数据集(客户数据)如下所示:

Purchase_time            Amount
-------------         ---------
01/02/2015 01:57 PM         5
01/02/2015 02:46 PM         12
01/02/2015 03:13 PM         2
01/02/2015 03:30 PM         8

我想从不同时间窗口的戳记数据中将Date_Time列中的客户数据中的“Amount”列相加,最终结果如下所示:

Date_Time            Cost     Amount_15min   Amount_30min
---------           -----    --------------  -------------
01/02/2015 01:52 PM    6          5             5
01/02/2015 02:22 PM    2          0            12
01/03/2015 02:42 PM    50         12           12
01/04/2015 03:01 PM    25         8            8

理想情况下,我想创建15分钟间隔的列,直到360分钟(或更长时间)

我怎样才能在R中这样做?

谢谢!

1 个答案:

答案 0 :(得分:0)

我想你会发现大部分代码都是直截了当的。我们需要将日期转换为POSIX对象以对它们执行数学运算。 POSIX对象存储为整数,表示自1970年1月1日以来经过的秒数,因此在对它们执行数学运算时,我们将转换为数字,然后从中添加/减去秒数。

### Build test data frame
### times is a character vector and cost is a numeric vector
times <- c(
"01/02/2015 01:52 PM",
"01/02/2015 01:57 PM",
"01/02/2015 01:58 PM",
"01/02/2015 02:52 PM",
"01/02/2015 02:55 PM")

cost <- c(8, 2, 50, 26, 7)

df <- data.frame(times = times, cost = cost, stringsAsFactors = FALSE)


#### convert times to POSIX dates
df$times <- strptime(df$times, format = "%m/%d/%Y %I:%M %p")

### polling frequency in minutes
pollinglength <- 15

### create empty vector to hold sums
amount <- rep(NA, nrow(df))

for( i in 1:nrow(df)){

  ### POSIX support comparison operators well
  upperWindow <- df$times <= df$times[i]

  ### POSIX does not support addition/subtraction well, so we will convert to numeric first
  lowerWindow <- as.numeric(df$times) > (as.numeric(df$times[i]) - pollinglength * 60)

  amount[i] <- sum(df$cost[ upperWindow & lowerWindow ])
}

### Add back to data frame
df <- cbind(df, amount)