基于不同组合计算群组相对频率的误差

时间:2015-07-02 07:53:06

标签: r dplyr apply

我想根据不同的组合计算分类变量的频率和相对频率。我已经计算了频率并且没有成功地将输出管道连接到相对频率计算。有人可以帮我识别错误吗?

# Random generation of values for categorical data
set.seed(33)
df <- data.frame(cat1 = sample( LETTERS[1:2], 100, replace=TRUE ), 
                cat2 = sample( LETTERS[3:5], 100, replace=TRUE ),
                cat3 = sample( LETTERS[2:4], 100, replace=TRUE ),
                var1 = sample( LETTERS[1:3], 100, replace=TRUE ), 
                var2 = sample( LETTERS[3:8], 100, replace=TRUE ),
                var3 = sample( LETTERS[2:3], 100, replace=TRUE ),
                vre1 = sample( LETTERS[2:7], 100, replace=TRUE ), 
                vre2 = sample( LETTERS[1:5], 100, replace=TRUE ),
                ref3 = sample( LETTERS[2:9], 100, replace=TRUE ),
                con1 = runif(100,0,100),
                con2 = runif(100,23,45))

# Calculating the frequency
library(dplyr)
cat.names <- c('var1','var3','vre2','ref3')
df %>% group_by(cat1, cat3) %>% summarise_each(funs(n = n()), one_of(cat.names))

# Piping it to calculate the relative frequency/Percentage
df %>% group_by(cat1, cat3) %>% summarise_each(funs(n = n()), one_of(cat.names)) %>% mutate(freq = n / sum(n))

# Error
Error: invalid 'type' (closure) of argument

#Expected Output
    cat1    cat3    var1.freq   var3.freq   vre2.freq   ref3.freq   var1.rfreq  var3.rfreq  vre2.rfreq  ref3.rfreq
1   A   B   8   8   8   8   0,153846154 0,153846154 0,153846154 0,153846154
2   A   C   27  27  27  27  0,519230769 0,519230769 0,519230769 0,519230769
3   A   D   17  17  17  17  0,326923077 0,326923077 0,326923077 0,326923077
4   B   B   16  16  16  16  0,333333333 0,333333333 0,333333333 0,333333333
5   B   C   12  12  12  12  0,25    0,25    0,25    0,25
6   B   D   20  20  20  20  0,416666667 0,416666667 0,416666667 0,416666667

2 个答案:

答案 0 :(得分:2)

使用data.table的另一种解决方案:

result<-df[,.(fr.v1=sum(table(var1)),fr.v2=sum(table(var2))),
    by=.(cat1,cat3)][,prop.v1:=fr.v1/sum(fr.v1),by=cat1]

为简单起见,我只计算var1var2的频率,但扩展代码很简单。

答案 1 :(得分:1)

尝试

 df1 <- df %>%
          group_by(cat1, cat3) %>%
          summarise_each(funs(n()), one_of(cat.names))
 df2 <- df1 %>%
            group_by(cat1) %>% 
            mutate_each(funs(./sum(.)), var1:ref3)
 bind_cols(df1, df2[-(1:2)])