R finding the average order interval (number of days)

时间:2015-06-25 19:08:50

标签: r for-loop

My goal is to obtain the average number of days it takes for a given product to be purchased. If Product_A is purchased three times over a given period ('2012-12-01','2012-12-05,'2012-12-10') then our average order interval will be the average of 4 & 5 - 4.5 days. I wrote a For Loop to calculate the interval between two points (I can use the aggregate function to calculate my mean or median by product) but I keep getting a length error. This is supposed to be a scale-able solution Here is a sample dataframe: product_info <- data.frame(productId = c("A", "A", "A", "B","B","B"), order_date = c("2014-05-01", "2014-05-05", "2014-05-10", "2014-06-01","2014-06-07", "2014-06-18"), stringsAsFactors=FALSE) Here is my code: for (i in 2:length(unique(product_info$productId))){ if(product_info$productId[i]==product_info$productId[i-1]){ product_info$interval[i] <- as.integer(difftime(product_info$order_date[i],product_info$order_date[i-1])) } } My desired output should be: product_info <- data.frame(productId = c("A", "A", "A", "B","B","B"), order_date = c("2014-05-01", "2014-05-05", "2014-05-10", "2014-06-01","2014-06-07", "2014-06-18"), interval= c("0", "4", "5", "0","6","11"), stringsAsFactors=FALSE) Any help would be very much appreciated. Thank you

3 个答案:

答案 0 :(得分:3)

You can try product_info$order_date <- as.Date(product_info$order_date) product_info$interval <- with(product_info, ave(as.numeric(order_date), productId, FUN=function(x) c(0, diff(x)))) product_info productId order_date interval 1 A 2014-05-01 0 2 A 2014-05-05 4 3 A 2014-05-10 5 4 B 2014-06-01 0 5 B 2014-06-07 6 6 B 2014-06-18 11 Or using data.table library(data.table)#v1.9.5+ setDT(product_info)[,interval := c(0, diff(as.Date(order_date))) , productId] If the 'order_date' is not ordered, we have to 'order` it before doing the 'diff' setDT(product_info)[, order_date:= as.Date(order_date) ][order(order_date), interval :=as.numeric(order_date - shift(order_date, fill=order_date[1L])) , by = productId] # productId order_date interval #1: A 2014-05-01 0 #2: A 2014-05-05 4 #3: A 2014-05-10 5 #4: B 2014-06-01 0 #5: B 2014-06-07 6 #6: B 2014-06-18 11

答案 1 :(得分:2)

Convert to date format - product_info$order_date <- as.Date(product_info$order_date) Using dplyr: library(dplyr) product_info %>% group_by(productId) %>% mutate(interval=c(0,diff(order_date))

答案 2 :(得分:2)

这是一个dplyr解决方案。您首先要转换为日期格式,然后按日期排序,按产品分组,最后添加列,这是此产品中最近两天之间的差异。请注意,0天已替换为NA,其中恕我直言0更适用。

library(dplyr)
product_info <- product_info %>%
    mutate(order_date=as.Date(order_date)) %>%
    arrange(order_date) %>%
    group_by(productId) %>%
    mutate(interval=order_date-lag(order_date))

product_info
  productId order_date interval
1         A 2014-05-01  NA days
2         A 2014-05-05   4 days
3         A 2014-05-10   5 days
4         B 2014-06-01  NA days
5         B 2014-06-07   6 days
6         B 2014-06-18  11 days