我有这个数据集:
test <- data.frame(c(1, 2, 3, 2, 2, 1, 2, 3, 2, 2,1, 2, 3, 2, 2),
c(10, 10, 10,8,1, NA,8, NA, 6, NA, 9, 10, 8, 5, 8))
names(test) <- c("Group", "Q1")
我想应用以下功能。
nps.exc <- function(x){
exc <- subset(x, x<11)
result <- data.frame("Detractors" = integer(0),
"Passives" = integer(0), "Promoters" = integer(0))
result[1,1] <- (length(which(exc < 7)))/length(exc)
result[1,2] <- (length(which(exc == 7| exc == 8)))/length(exc)
result[1,3] <- (length(which(exc == 9| exc == 10)))/length(exc)
result
}
当我在整个数据集上运行该函数时,我得到了三个结果(Detractors / Passives / Promoters):
nps.exc(test$Q1)
但是,我希望按组分类(第1栏)。当我使用聚合时,我会失去三个单独的结果:
aggregate(Q1 ~ Group, test, nps.exc)
这里总有新手,我错过了什么?
答案 0 :(得分:1)
您可以将'nps.exc'的最后一行更改为do.call(rbind, result)
nps.exc <- function(x){
exc <- subset(x, x<11)
result <- data.frame("Detractors" = integer(0),
"Passives" = integer(0), "Promoters" = integer(0))
result[1,1] <- (length(which(exc < 7)))/length(exc)
result[1,2] <- (length(which(exc == 7| exc == 8)))/length(exc)
result[1,3] <- (length(which(exc == 9| exc == 10)))/length(exc)
do.call(rbind, result)
}
并在aggregate
res <- do.call(data.frame,aggregate(Q1 ~ Group, test, nps.exc))
str(res)
#'data.frame': 3 obs. of 4 variables:
#$ Group: num 1 2 3
#$ Q1.1 : num 0 0.375 0
#$ Q1.2 : num 0 0.375 0.5
#$ Q1.3 : num 1 0.25 0.5
答案 1 :(得分:1)
这是一个使用table
的自然地方,默认情况下会排除NA
个;和prop.table
,它将表从计数转换为比例:
nps.exc <- function(x){
xf <- factor(findInterval(x,c(7,9,11)),levels=c("0","1","2"))
setNames(prop.table(table(xf)),c("Detractors","Passives","Promoters"))
}
aggregate(Q1 ~ Group, test, nps.exc)
# Group Q1.Detractors Q1.Passives Q1.Promoters
# 1 1 0.000 0.000 1.000
# 2 2 0.375 0.375 0.250
# 3 3 0.000 0.500 0.500
工作原理:
findInterval
将x
映射到基于切割点c(7,9,11)
的区间,0
用于低于第一个切割点的任何区域。factor
部分确保三个案例(&lt; 7,7-8,9-10)都被考虑,即使它们没有出现在x
中;并且第四种情况(11+)映射到NA
。 效率。为所有xf
定义Q1
而不是按Group
单独定义会更高效:
nps.exc.g <- function(x,g){
xf <- factor(findInterval(x,c(7,9,11)),levels=c("0","1","2"))
levels(xf) <- c("Detractors","Passives","Promoters")
prop.table(table(g,xf),1)
}
with(test,nps.exc.g(Q1,Group))
# xf
# g Detractors Passives Promoters
# 1 0.000 0.000 1.000
# 2 0.375 0.375 0.250
# 3 0.000 0.500 0.500
这里的缺点是结果是一个table
- 类对象,这很痛苦。