data.frame列中的列联表

时间:2016-09-01 06:02:50

标签: r dataframe dataset aggregate

我试图从我的数据集创建四向列联表。 我的数据集如下所示:

a <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1)
b <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1)
group1 <- sample(letters[25:26], 12, replace = T)
group2 <- sample(letters[7:10], 12, replace = T)

df <- data.frame(a, b, group1, group2)

我尝试使用aggregate功能。创建三向列联表时,一切正常

aggregate(cbind(a, b) ~ group1, data = df, FUN = table)
  group1 a.0 a.1 b.0 b.1
1      y   3   4   3   4
2      z   2   3   2   3

但是,在添加第二个分组变量时,输出会令人困惑,不需要。

aggregate(. ~ group1 + group2, data = df, FUN = table)
  group1 group2    a    b
1      y      g    3    3
2      z      g    1    1
3      z      h    1    1
4      y      i    1    1
5      z      i    1    1
6      y      j 2, 1    3
7      z      j 1, 1 1, 1

由于我的原始数据集非常大,我会很感激一些优雅而自动的方法来处理它。 Ť

2 个答案:

答案 0 :(得分:1)

It is not clear about the expected output. Perhaps we need melt/dcast

library(data.table)
dcast(melt(setDT(df), id.var = c("group1", "group2")), 
                       group1 + group2 ~variable + value, length)

Or use the recast (wrapper for melt/dcast from reshape2)

library(reshape2)
recast(df, measure.var = c("a", "b"), ... ~ variable + value, length)
#    group1 group2 a_0 a_1 b_0 b_1
#1      y      g   1   4   3   2
#2      y      h   1   0   1   0
#3      y      j   1   1   0   2
#4      z      g   2   0   0   2
#5      z      i   0   1   0   1
#6      z      j   0   1   1   0

The OP's aggregate give this output

aggregate(. ~ group1 + group2, data = df, FUN = table)
#  group1 group2    a    b
#1      y      g 1, 4 3, 2
#2      z      g    2    2
#3      y      h    1    1
#4      z      i    1    1
#5      y      j 1, 1    2
#6      z      j    1    1

If we want aggregate to get both the levels, then convert to a factor with levels specified and do the table

do.call(data.frame, aggregate(cbind(a, b) ~ group1 + group2, data = df, 
              FUN = function(x) table(factor(x, levels = 0:1))))
#  group1 group2 a.0 a.1 b.0 b.1
#1      y      g   1   4   3   2
#2      z      g   2   0   0   2
#3      y      h   1   0   1   0
#4      z      i   0   1   0   1
#5      y      j   1   1   0   2
#6      z      j   0   1   1   0

If we want all the combinations, there is drop = FALSE in dcast

dcast(melt(setDT(df), id.var = c("group1", "group2")), group1 + group2 ~
                   variable + value, length, drop = FALSE)

Or in recast

recast(df, measure.var = c("a", "b"), ... ~ variable + value, length, drop = FALSE) 

NOTE: There was no set.seed for sample, so the output showed here will be different from the OP's output

答案 1 :(得分:1)

可能有点复杂,但也许它会有所帮助,因为据我所知,你只是想数一数,所以这可能有所帮助:

if(Input::get('test1') == Input::get('test2')){
     $test = 'required',
}
else {
     $test = '',
}

'test' => [
              'integer',
              'min:1',
              "$test",
          ],