我有df1$NextBizDay
,其中有日期。
其他数据框df2
具有两列df2$Date
和df2$Sales
现在,我正在尝试使用df1$NextBizDay
中的日期来计算平均销售额。由于df1$NextBizDay
可以有重复的日期,因此我使用as.data.frame(table(df1$NextBizDay))[,2]
现在我需要进行求和除法,以便获得平均销售额。
我知道aggregate(Sales~Date,df2,sum)[,2]
将给我每个日期的总销售额,但是我不确定如何继续进行。
示例输入:
df1$NextBizday
2018-10-22
2018-10-22
2018-10-23
2018-10-23
2018-10-23
2018-10-24
df2$Date df2$Sales
2018-10-22 1000
2018-10-23 2000
2018-10-24 3000
2018-10-25 4000
2018-10-26 5000
2018-10-27 6000
预期输出df1
NextBizday AvgSales
2018-10-22 500
2018-10-22 500
2018-10-23 666.6666667
2018-10-23 666.6666667
2018-10-23 666.6666667
2018-10-24 3000
基本上,我想做的是从df2中的df1查找每个日期,然后将销售除以df1中的日期频率。
答案 0 :(得分:1)
我们可以在两个数据集data.table
的'NextBizday / Date'列之间使用on
联接,然后通过取:=
来进行赋值(sum
)。 '销售',然后除以行数(.N
),以创建'AvgSales'
library(data.table)
setDT(df1)[df2, AvgSales := sum(Sales)/.N, on = .(NextBizday = Date), by = .EACHI]
df1
# NextBizday AvgSales
#1: 2018-10-22 500.0000
#2: 2018-10-22 500.0000
#3: 2018-10-23 666.6667
#4: 2018-10-23 666.6667
#5: 2018-10-23 666.6667
#6: 2018-10-24 3000.0000
另一个选项是tidyverse
,它可能更易于理解
library(dplyr)
df1 %>%
# do a left join
left_join(df2, by = c("NextBizday" = "Date")) %>%
# grouped by NextBizday
group_by(NextBizday) %>%
# divide the `first` value of 'Sales' by the number of rows `n()`
transmute(AvgSales = first(Sales)/n())
# A tibble: 6 x 2
# Groups: NextBizday [3]
# NextBizday AvgSales
# <date> <dbl>
#1 2018-10-22 500
#2 2018-10-22 500
#3 2018-10-23 667.
#4 2018-10-23 667.
#5 2018-10-23 667.
#6 2018-10-24 3000
df1 <- structure(list(NextBizday = structure(c(17826, 17826, 17827,
17827, 17827, 17828), class = "Date")), row.names = c(NA, -6L
), class = "data.frame")
df2 <- structure(list(Date = structure(c(17826, 17827, 17828, 17829,
17830, 17831), class = "Date"), Sales = c(1000L, 2000L, 3000L,
4000L, 5000L, 6000L)), row.names = c(NA, -6L), class = "data.frame")
答案 1 :(得分:0)
尝试aggregate
:
aggregate(Sales~Date, df2, FUN = mean, na.rm = T)[,2]