我想使用dplyr进行一些数据操作。背景:我有一个调查权重和一堆变量(大多数是喜欢项目)。我想在有和没有调查权重的情况下对每个类别的频率和百分比求和。
举个例子,让我们只使用性别变量的频率。结果应该是这样的:
gender freq freq.weighted
1 292 922.2906
2 279 964.7551
9 6 21.7338
我会为许多变量做这件事。所以,我决定将dplyr-code放在一个函数中,所以我只需要更改变量并输入更少的内容。
#exampledata
gender<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2")
survey_weight<-c("2.368456","2.642901","2.926698","3.628653","3.247463","3.698195","2.776772","2.972387","2.686365","2.441820","3.494899","3.133106","3.253514","3.138839","3.430597","3.769577","3.367952","2.265350","2.686365","3.189538","3.029999","3.024567","2.972387","2.730978","4.074495","2.921552","3.769577","2.730978","3.247463","3.230097")
test_dataframe<-data.frame(gender,survey_weight)
#function
weighting.function<-function(dataframe,variable){
test_weighted<- dataframe %>%
group_by_(variable) %>%
summarise_(interp(freq=count(~weight)),
interp(freq_weighted=sum(~weight)))
return(test_weighted)
}
result_dataframe<-weighting.function(test_dataframe,"gender")
#this second step was left out in this example:
#mutate_(perc=interp(~freq/sum(~freq)*100),perc_weighted=interp(~freq_weighted/sum(~freq_weighted)*100))
这会导致以下错误消息:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "formula"
我尝试过很多不同的事情。首先,我使用freq=n()
计算频率,但我总是得到一个错误(我检查过,plyr在dplyr之前加载而不是之后加载 - 它也没有用。)。
有什么想法吗?我读了关于标准评估的小插图。但是,我总是遇到问题,不知道什么是解决方案。
答案 0 :(得分:13)
我认为你有一些嵌套错误导致你出现问题。最大的一个是使用count()
代替summarise()
。我猜你想要n()
:
weighting.function <- function(dataframe, variable){
dataframe %>%
group_by_(variable) %>%
summarise_(
freq = ~n(),
freq_weighted = ~sum(survey_weight)
)
}
weighting.function(test_dataframe, ~gender)
您还有一些不必要的interp()
用法。如果您使用interp()
,则调用应该看起来像freq = interp(~n())
,即名称不在调用interp之外,正在插入的内容以~
开头。