聚合因子水平计数 - 按因子计算

时间:2016-05-14 18:30:28

标签: r dplyr plyr reshape reshape2

我一直试图制作一个表格,用另一个因素显示因子水平的计数。为此,我查看了几十页,问题......尝试在某些软件包中使用函数(dplyr,reshape)来完成工作,没有成功正确使用它们。

这就是我得到的:

# my data:
var1 <- c("red","blue","red","blue","red","red","red","red","red","red","red","red","blue","red","blue")
var2 <- c("0","1","0","0","0","0","0","0","0","0","1","0","0","0","0")
var3 <- c("2","2","1","1","1","3","1","2","1","1","3","1","1","2","1")
var4 <- c("0","1","0","0","0","0","1","0","1","1","0","1","0","1","1")
mydata <- data.frame(var1,var2,var3,var4)
head(mydata)

尝试n + 1:仅显示因子的总计数。

t(aggregate(. ~ var1, mydata, sum))

      [,1]   [,2] 
var1 "blue" "red"
var2 " 5"   "12" 
var3 " 5"   "18" 
var4 " 6"   "16" 

尝试n + 2:这是正确的格式,但我不能让它在多个因素上工作。

library(dplyr)
data1 <- ddply(mydata, c("var1", "var3"), summarise,
            N    = length(var1))
library(reshape)
df1 <- cast(data1, var1 ~ var3, sum)
df1 <- t(df1)
df1

   blue red
1    3   6
2    1   3
3    0   2

我想要的是:

        blue red
var2.0    3  10
var2.1    1   1
var3.1    3   6
var3.2    1   3
var3.3    0   2
var4.0    2   6
var4.1    2   5

我怎样才能获得这种格式?非常感谢提前,

1 个答案:

答案 0 :(得分:3)

我们可以通过&#39; var1&#39;来melt数据集。然后使用table

library(reshape2)
tbl <- table(transform(melt(mydata, id.var="var1"),
        varN = paste(variable, value, sep="."))[c(4,1)])
names(dimnames(tbl)) <- NULL
tbl 
#
#         blue red
#  var2.0    3  10
#  var2.1    1   1
#  var3.1    3   6
#  var3.2    1   3
#  var3.3    0   2
#  var4.0    2   6
#  var4.1    2   5

或者使用dplyr/tidyr,我们会转换广泛的数据集&#39;长期&#39;格式为gather,然后unite列(&#39; var&#39;,&#39; val&#39;)创建&#39; varV&#39;,获取频率(在{var 1&#39;和&#39; varV&#39;,然后tally到&#39;范围&#39;格式。

spread