Question

我使用的数据框与以下内容非常相似：

Image here, unfortunately don't have enough reputation yet

这是一个600,000行数据帧。我想要做的是对于同一日期内的每个重复实例，我想将成本除以重复实例的总数。我还想只考虑那些属于＆＃34; Sales＆＃34;战术。

例如，在2016年1月1日，有2个＆＃34;帮助包＆＃34;这也是＆＃34;销售＆＃34;战术。由于同一日期内有2个实例，我想将每个实例的成本除以2（因此每个成本将为5美元）。

这是我的代码：

for(i in 1:length(dfExample$Date)){
  if(dfExample$Tactic) == "Sales"){
    list = agrep(dfExample$Package[i], dfExample$Package)
    for(i in list){
      date_repeats = agrep(i, dfExample$Date)
      dfExample$Cost[date_repeats] = dfExample$Package[i]/length(date_repeats)
      }
  }
}

这是非常低效和缓慢的。我知道必须有更好的方法来实现这一目标。任何帮助将非常感激。谢谢！

Answer 1

ave()可以提供没有其他套餐的解决方案：

with(dfExample, Cost / ave(Cost, Date, Package, Tactic, FUN=length))

Answer 2

使用dplyr：

library(dplyr)
dfExample %>%
    group_by(Date, Package, Tactic) %>%
    mutate(Cost = Cost / n())

我有点不清楚你的意思＆＃34;实例＆＃34;。这个（非常清楚）按日期，包装和战术分组，因此将这些列的每个独特组合视为石斑鱼。如果您未在{＆＃34;实例＆＃34;的定义中包含Tactic，则可以将其删除为仅按日期和包进行分组。

如何在更改列时避免R中的循环

2 个答案: