Question

我使用的是18列数据框，工作列为CPM和SpendRange。支出范围分为1：3000，步长为50。

我试图在每个支出范围内平均每千次展示费用（每千米成本），并生成一个数据框，其中包含每个支出范围和平均每千次展示费用。

我试过了：

CPMbySpend<-aggregate(Ads$CPM,by=list(Ads$SpendRange),function(x) paste0(sort(unique(x)),collapse=mean(Ads$CPM))
    CPMbySpend<-data.frame(CPMbySpend)

显然我发现我不能将崩溃作为一种功能......任何建议？

最佳输出将是：

  1-50   | mean(allvalues with spendrange 1-50)
  51-100 | mean(allvalues with spendrange 51-100)

Answer 1

使用新数据集

  Ads <- read.csv("Test.csv", header=TRUE, stringsAsFactors=FALSE)
  Ads$CPM <- as.numeric(Ads$CPM) #the elements that are not numeric ie. `-$` etc. will be coerced to NAs
  #Warning message:
  #NAs introduced by coercion 

  res <- aggregate(Ads$CPM,by=list(SpendRange=Ads$SpendRange),FUN=mean, na.rm=TRUE)

如果您想订购SpendRange，0-1，1-50等，一种方法是使用mixedorder中的gtools。

  library(gtools)
  res1 <- res[mixedorder(res$SpendRange),] 
  row.names(res1) <- NULL
  head(res1)
  # SpendRange        x
  #1       0-1  1.621987
  #2      1-50  2.519853
  #3    51-100  3.924538
  #4   101-150  5.010795
  #5   151-200  3.840549
  #6   201-250  4.286923

否则，您可以通过调用levels指定SpendRange factor来更改顺序。

  res1$SpendRange <- factor(res1$SpendRange, levels= c('0-1', '1-50',.....)) #pseudocode

然后使用

  res1[order(res1$SpendRange),]

按唯一值和平均值分组数据框

1 个答案: