Question

我有一个数据表（temp3）就像（原始表有大约100万行） -

creative_code   reqcount    hasbought   numclick    FeedbackCPM bidvalue_CPMf   browser
79  5   1   0   19   9  C
1   0   0   0   39  50  C
79  3   1   0 1205 684  C
1   7   1   5   82 159  C
1   9   0   3   15  77  C
79  5   0   0 1575  700 C
1   0   0   0   95  300 C
1   4   1   4   95  300 C
1   3   0   0   1   300 C
1   8   0   0  30   65  C
1   9   1   0   17  293 C
1   4   0   1  140  300 IE
79  4   0     0 838 271 F
79  7   1     2 0   13  C
 1  9   2   0    67 160 C
79  2   0   0   268 176 F
79  0   1   23 1634 700 C
79  1   0   0     0 300 C
79  5   0   0   143  87 C
79  7   2   0     0   9 IE
 1  3   0   0   178 300 IE
 1  7   0   0   111 200  F

我需要的是所有具有reqcount，hasbought，hasclick的Creative_code。我可以使用命令单独找到Creative_code + reqcount的意思 - 集合体（bidvalue_CPMf〜creative_code + reqcount，数据= TEMP3，FUN =平均值）

但是，如果我使用以下代码，我会收到错误 -

Code - 
 for (j in names(temp3))        aggregate(bidvalue_CPMf~creative_code+j,data=temp3,FUN=mean)
Error - Error in model.frame.default(formula = bidvalue_CPMf ~ creative_code +  :   variable lengths differ (found for 'j')

请帮忙。

Answer 1

您需要的是as.formula

df <- read.table("clipboard", header = T)
Columns <- names(df)[!names(df) %in% c("bidvalue_CPMf", "creative_code")]

for (j in Columns){
  fo <- as.formula(paste("bidvalue_CPMf~creative_code+",j))
  print(aggregate(fo,data=df,FUN=mean))
}

如果只需要使用reqcount, hasbought,hasclick进行分析。使用

Columns <- c("reqcount", "hasbought", "hasclick")

Answer 2

你可以尝试

  nm1 <- names(temp3)[2:4]
  lapply(nm1, function(x) {
    aggregate(temp3['bidvalue_CPMf'], by = c(temp3['creative_code'], temp3[x]),
                     FUN=mean)
   })

[[1]]
   creative_code reqcount bidvalue_CPMf
1              1        0      175.0000
2             79        0      700.0000
3             79        1      300.0000
4             79        2      176.0000
5              1        3      300.0000
6             79        3      684.0000
7              1        4      300.0000
8             79        4      271.0000
9             79        5      265.3333
10             1        7      179.5000
11            79        7       11.0000
12             1        8       65.0000
13             1        9      176.6667

[[2]]
  creative_code hasbought bidvalue_CPMf
1             1         0      199.0000
2            79         0      306.8000
3             1         1      250.6667
4            79         1      351.5000
5             1         2      160.0000
6            79         2        9.0000

[[3]]
  creative_code numclick bidvalue_CPMf
1             1        0         208.5
2            79        0         279.5
3             1        1         300.0
4            79        2          13.0
5             1        3          77.0
6             1        4         300.0
7             1        5         159.0
8            79       23         700.0

使用个别方法检查结果

aggregate(bidvalue_CPMf~creative_code+hasbought, temp3, FUN=mean)
  creative_code hasbought bidvalue_CPMf
1             1         0      199.0000
2            79         0      306.8000
3             1         1      250.6667
4            79         1      351.5000
5             1         2      160.0000
6            79         2        9.0000

如果数据集很大，您可以使用dplyr或data.table

library(dplyr)
lapply(nm1, function(x){
             temp3 %>%
                group_by_('creative_code',.dots=x) %>% 
                summarise(bidvalue_CPMf=mean(bidvalue_CPMf))})

使用data.table

library(data.table)
setDT(temp3)
lapply(nm1, function(x) temp3[, .(bidvalue_CPMf=mean(bidvalue_CPMf)) , 
                      c('creative_code', x)])

FOR循环中R中的聚合函数

2 个答案: