我想根据不同的组合计算分类变量的频率和相对频率。我已经计算了频率并且没有成功地将输出管道连接到相对频率计算。有人可以帮我识别错误吗?
# Random generation of values for categorical data
set.seed(33)
df <- data.frame(cat1 = sample( LETTERS[1:2], 100, replace=TRUE ),
cat2 = sample( LETTERS[3:5], 100, replace=TRUE ),
cat3 = sample( LETTERS[2:4], 100, replace=TRUE ),
var1 = sample( LETTERS[1:3], 100, replace=TRUE ),
var2 = sample( LETTERS[3:8], 100, replace=TRUE ),
var3 = sample( LETTERS[2:3], 100, replace=TRUE ),
vre1 = sample( LETTERS[2:7], 100, replace=TRUE ),
vre2 = sample( LETTERS[1:5], 100, replace=TRUE ),
ref3 = sample( LETTERS[2:9], 100, replace=TRUE ),
con1 = runif(100,0,100),
con2 = runif(100,23,45))
# Calculating the frequency
library(dplyr)
cat.names <- c('var1','var3','vre2','ref3')
df %>% group_by(cat1, cat3) %>% summarise_each(funs(n = n()), one_of(cat.names))
# Piping it to calculate the relative frequency/Percentage
df %>% group_by(cat1, cat3) %>% summarise_each(funs(n = n()), one_of(cat.names)) %>% mutate(freq = n / sum(n))
# Error
Error: invalid 'type' (closure) of argument
#Expected Output
cat1 cat3 var1.freq var3.freq vre2.freq ref3.freq var1.rfreq var3.rfreq vre2.rfreq ref3.rfreq
1 A B 8 8 8 8 0,153846154 0,153846154 0,153846154 0,153846154
2 A C 27 27 27 27 0,519230769 0,519230769 0,519230769 0,519230769
3 A D 17 17 17 17 0,326923077 0,326923077 0,326923077 0,326923077
4 B B 16 16 16 16 0,333333333 0,333333333 0,333333333 0,333333333
5 B C 12 12 12 12 0,25 0,25 0,25 0,25
6 B D 20 20 20 20 0,416666667 0,416666667 0,416666667 0,416666667
答案 0 :(得分:2)
使用data.table
的另一种解决方案:
result<-df[,.(fr.v1=sum(table(var1)),fr.v2=sum(table(var2))),
by=.(cat1,cat3)][,prop.v1:=fr.v1/sum(fr.v1),by=cat1]
为简单起见,我只计算var1
和var2
的频率,但扩展代码很简单。
答案 1 :(得分:1)
尝试
df1 <- df %>%
group_by(cat1, cat3) %>%
summarise_each(funs(n()), one_of(cat.names))
df2 <- df1 %>%
group_by(cat1) %>%
mutate_each(funs(./sum(.)), var1:ref3)
bind_cols(df1, df2[-(1:2)])