聚合不返回所有计算

时间:2015-06-17 15:08:11

标签: r

我有这个数据集:

test <- data.frame(c(1, 2, 3, 2, 2, 1, 2, 3, 2, 2,1, 2, 3, 2, 2),
       c(10, 10, 10,8,1, NA,8, NA, 6, NA, 9, 10, 8, 5, 8))
names(test) <- c("Group", "Q1")

我想应用以下功能。

nps.exc <- function(x){
  exc <- subset(x, x<11)
  result <- data.frame("Detractors" = integer(0),
         "Passives" = integer(0), "Promoters" = integer(0))
  result[1,1] <- (length(which(exc < 7)))/length(exc)
  result[1,2] <- (length(which(exc == 7| exc == 8)))/length(exc)
  result[1,3] <- (length(which(exc == 9| exc == 10)))/length(exc)
  result
}

当我在整个数据集上运行该函数时,我得到了三个结果(Detractors / Passives / Promoters):

nps.exc(test$Q1)

但是,我希望按组分类(第1栏)。当我使用聚合时,我会失去三个单独的结果:

aggregate(Q1 ~ Group, test, nps.exc)

这里总有新手,我错过了什么?

2 个答案:

答案 0 :(得分:1)

您可以将'nps.exc'的最后一行更改为do.call(rbind, result)

 nps.exc <- function(x){
  exc <- subset(x, x<11)
 result <- data.frame("Detractors" = integer(0),
     "Passives" = integer(0), "Promoters" = integer(0))
 result[1,1] <- (length(which(exc < 7)))/length(exc)
 result[1,2] <- (length(which(exc == 7| exc == 8)))/length(exc)
 result[1,3] <- (length(which(exc == 9| exc == 10)))/length(exc)
 do.call(rbind, result)
 }

并在aggregate

中使用它
 res <- do.call(data.frame,aggregate(Q1 ~ Group, test, nps.exc))
 str(res)
 #'data.frame': 3 obs. of  4 variables:
 #$ Group: num  1 2 3
 #$ Q1.1 : num  0 0.375 0
 #$ Q1.2 : num  0 0.375 0.5
 #$ Q1.3 : num  1 0.25 0.5

答案 1 :(得分:1)

这是一个使用table的自然地方,默认情况下会排除NA个;和prop.table,它将表从计数转换为比例:

nps.exc <- function(x){
    xf <- factor(findInterval(x,c(7,9,11)),levels=c("0","1","2"))
    setNames(prop.table(table(xf)),c("Detractors","Passives","Promoters"))
}

aggregate(Q1 ~ Group, test, nps.exc)
#   Group Q1.Detractors Q1.Passives Q1.Promoters
# 1     1         0.000       0.000        1.000
# 2     2         0.375       0.375        0.250
# 3     3         0.000       0.500        0.500

工作原理:

  • findIntervalx映射到基于切割点c(7,9,11)的区间,0用于低于第一个切割点的任何区域。
  • factor部分确保三个案例(&lt; 7,7-8,9-10)都被考虑,即使它们没有出现在x中;并且第四种情况(11+)映射到NA

效率。为所有xf定义Q1而不是按Group单独定义会更高效:

nps.exc.g <- function(x,g){
    xf         <- factor(findInterval(x,c(7,9,11)),levels=c("0","1","2"))
    levels(xf) <- c("Detractors","Passives","Promoters")
    prop.table(table(g,xf),1)
}

with(test,nps.exc.g(Q1,Group))
#    xf
# g   Detractors Passives Promoters
#   1      0.000    0.000     1.000
#   2      0.375    0.375     0.250
#   3      0.000    0.500     0.500

这里的缺点是结果是一个table - 类对象,这很痛苦。