我有这种格式的数据集
Order_Name Frequency Order_Dt
A 2 2016-01-20
A 2 2016-05-01
B 1 2016-02-12
C 3 2016-03-04
C 3 2016-07-01
C 3 2016-08-09
我需要找到已经放置超过1次的那些订单的日期之间的平均差异,即频率> 1.
答案 0 :(得分:0)
require(dplyr)
# loading the data
df0 <- read.table(text =
'Order_Name Frequency Order_Dt
A 2 2016-01-20
A 2 2016-05-01
B 1 2016-02-12
C 3 2016-03-04
C 3 2016-07-01
C 3 2016-08-09',
stringsAsFactors = F,
header = T)
# putting the date in the right format
df0$Order_Dt <- as.Date(df0$Order_Dt)
# obtaining the averages
df0 %>% filter(Frequency > 1) %>%
arrange(., Order_Name, Order_Dt) %>%
mutate(diff_date = Order_Dt - lag(Order_Dt)) %>%
group_by(Order_Name) %>%
summarise(avg_days = mean(diff_date, na.rm = T))
# A tibble: 2 × 2
Order_Name avg_days
<chr> <time>
1 A 102.00000 days
2 C 33.33333 days