使用dplyr和tidyr将R中的多变量数据转换为聚合表

时间:2016-12-19 17:27:05

标签: r data.table dplyr tidyr

我正在使用dplyr和tidyr进行聚合和汇总一些多变量数据。如何以类似表格的形式呈现数据?

数据集:

year, division, group, count
2016, utensils, forks, 10
2016, utensils, spoons, 5
2016, utensils, knives, 20
2015, utensils, spoons, 4
2015, utensils, knives, 15
2015, utensils, forks, 11
2016, tools, hammer, 10
2016, tools, wrench, 5
2016, tools, awe, 20
2015, tools, hammer, 4
2015, tools, wrench, 15
2015, tools, awe 11

我想提供这样的信息:

          2016       2015
        Utensils  Utensils

Forks   count      count
Spoons  count      count
Knives  count      count

        2016      2015
        Tools    Tools

Hammer   count   count
Wrench   count   count 
Awe      count   count

1 个答案:

答案 0 :(得分:1)

您可以查看此内容。基本上它是一个重塑问题,但您需要首先通过 division 列拆分数据框,然后使用 dcast 转换每个子集:

library(reshape2)
lapply(split(df, df$division), function(s) dcast(group ~ year + division, data = s, value.var = "count"))

#$tools
#   group 2015_tools 2016_tools
#1    awe         11         20
#2 hammer          4         10
#3 wrench         15          5

#$utensils
#   group 2015_utensils 2016_utensils
#1  forks            11            10
#2 kinves            15            20
#3 spoons             4             5

或者由于每个子数据框只包含一个唯一的分区,您可以从列名中删除它而不添加dcast公式,因为它不会添加额外的信息:

lapply(split(df, df$division), function(s) dcast(group ~ year, data = s, value.var = "count"))

#$tools
#   group 2015 2016
#1    awe   11   20
#2 hammer    4   10
#3 wrench   15    5

#$utensils
#   group 2015 2016
#1  forks   11   10
#2 kinves   15   20
#3 spoons    4    5