R - 基于日期间隔和3个因子变量的聚合值

时间:2015-10-16 18:34:50

标签: r sum aggregate-functions

我有ExpandedGrid 11760 obs of 4 variables

Date - date format
Device - factor
Creative - factor
Partner - factor

我还有MediaPlanDF 215 obs of 6 variables

Interval - an interval of dates I created using lubridate
Partner - factor
Device - factor
Creative - factor
Daily Spend - num
Daily Impressions - num

这是我的麻烦。

我需要根据以下两个条件,在MediaPlanDF的相应列中汇总每日支出和每日展示次数:

标准1

- ExpandedGrid$Device matches MediaPlanDF$Device
- ExpandedGrid$Creative matches MediaPlanDF$Creative
- ExpandedGrid$Partner matches MediaPlanDF$Partner

标准2

- ExpandedGrid$Date falls within MediaPlanDF$Interval

现在我可以针对每个标准自行解决这个问题,但是我最难将它们放在一起而不会出错,而且我对答案的搜索并没有取得很大的成功(很多很好的例子,但是没有什么我有能力适应我的背景)。我已经尝试了各种方法,但我的思维开始走向过于复杂的解决方案,我需要帮助。

我尝试过这样的索引:

indexb <- as.character(ExpandedGrid$Device) == as.character(MediaPlanDF$Device);
indexc <- as.character(ExpandedGrid$Creative) == as.character(MediaPlanDF$Creative);
indexd <- as.character(ExpandedGrid$Partner) == as.character(MediaPlanDF$Partner);
index <- ExpandedGrid$Date %within% MediaPlanDF$Interval;

KEYDF <- data.frame(index, indexb, indexc, indexd)
KEYDF$Key <- apply(KEYDF, 1, function(x)(all(x) || all(!x)))
KEYDF$Key.cha <- as.character(KEYDF$Key)

outputbydim <- do.call(rbind, lapply(KEYDF$Key.cha, function(x){
  index <- x == "TRUE";
  list(impressions = sum(MediaPlanDF$Daily.Impressions[index]),
       spend = sum(MediaPlanDF$Daily.Spend[index]))}))

不幸的是,这会排除正确求和的值,但是那些真值的总和值是不正确的。

以下是数据摘录:

ExpandedGrid:

Date          Device     Creative     Partner
2015-08-31   "Desktop"  "Standard"   "ACCUEN"

MediaPlanDF

Interval                                            Device     Creative     Partner     Daily Spend      Daily Impressions
2015-08-30 17:00:00 PDT--2015-10-03 17:00:00 PDT   "Desktop"  "Standard"   "ACCUEN"     1696.27          1000339.17

有谁知道从哪里去?

提前致谢!

0 个答案:

没有答案