如何在R中汇总类别数据?

时间:2019-04-02 16:26:46

标签: r aggregate

我有一个数据框,其中包含两列带有分类变量(更好,相似,更糟)的列。我想提出一个表格,该表格计算这些类别在两列中出现的次数。 我正在使用的数据框如下:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

我想提出一个这样的表:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

您将如何处理?

3 个答案:

答案 0 :(得分:7)

如评论中所述,table是标准的,例如

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0

答案 1 :(得分:3)

sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0

答案 2 :(得分:2)

一种dplyrtidyr的可能性是:

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

首先,它将数据从宽格式转换为长格式,“ var”列包括变量名,“ val”列对应值。其次,它按“ var”和“ val”计数。最后,它将数据传播为所需的格式。

或者使用dplyrreshape2,您可以执行以下操作:

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0