Question

说我有以下数据框（df）

Class   Occupation  X   Y
Lower   Agriculture 1   0
Upper   Agriculture 0   1
Upper   Agriculture 1   1
Upper   Agriculture 0   0
Upper   Business    1   0
Lower   Business    1   1
Lower   Business    0   0
Lower   Business    1   0

而且，我希望总结下面的数据

Occupation  X   Y
Agriculture 2   2
Business    3   1


Class   X   Y
Lower   3   1
Upper   2   2

现在，我必须执行以下方法，

table(df$Class, df$X)
table(df$Class, df$Y)
table(df$Occupation, df$X)
table(df$Occupation, df$Y)

然后我手动组合数据。如果我有很多列，可以有更好的方法吗？

Answer 1

您可以使用aggregate：

aggregate(cbind(X, Y) ~ Occupation, df, FUN = sum)
#   Occupation X Y
#1 Agriculture 2 2
#2    Business 3 1

aggregate(cbind(X, Y) ~ Class, df, FUN = sum)
#  Class X Y
#1 Lower 3 1
#2 Upper 2 2

另一种方式是xtabs：

xtabs(cbind(X, Y) ~ Occupation, df)

#Occupation    X Y
#  Agriculture 2 2
#  Business    3 1

xtabs(cbind(X, Y) ~ Class, df)

#Class   X Y
#  Lower 3 1
#  Upper 2 2

自动执行：

lapply(c('Class', 'Occupation'), function(x) {

  myform <- as.formula(paste('cbind(X, Y) ~', x))
  xtabs(myform, df)

})

Answer 2

如果我完全理解，您可以使用WM_KEYDOWN包和dplyr功能来实现这一目标。

您可以通过以下方式使用group_by功能和group_by：

summarise_each

我认为这应该适用于您的情况。

为r中的许多列创建contigency表？

2 个答案: