我正在使用dplyr和tidyr进行聚合和汇总一些多变量数据。如何以类似表格的形式呈现数据?
数据集:
year, division, group, count
2016, utensils, forks, 10
2016, utensils, spoons, 5
2016, utensils, knives, 20
2015, utensils, spoons, 4
2015, utensils, knives, 15
2015, utensils, forks, 11
2016, tools, hammer, 10
2016, tools, wrench, 5
2016, tools, awe, 20
2015, tools, hammer, 4
2015, tools, wrench, 15
2015, tools, awe 11
我想提供这样的信息:
2016 2015
Utensils Utensils
Forks count count
Spoons count count
Knives count count
2016 2015
Tools Tools
Hammer count count
Wrench count count
Awe count count
答案 0 :(得分:1)
您可以查看此内容。基本上它是一个重塑问题,但您需要首先通过 division 列拆分数据框,然后使用 dcast 转换每个子集:
library(reshape2)
lapply(split(df, df$division), function(s) dcast(group ~ year + division, data = s, value.var = "count"))
#$tools
# group 2015_tools 2016_tools
#1 awe 11 20
#2 hammer 4 10
#3 wrench 15 5
#$utensils
# group 2015_utensils 2016_utensils
#1 forks 11 10
#2 kinves 15 20
#3 spoons 4 5
或者由于每个子数据框只包含一个唯一的分区,您可以从列名中删除它而不添加dcast公式,因为它不会添加额外的信息:
lapply(split(df, df$division), function(s) dcast(group ~ year, data = s, value.var = "count"))
#$tools
# group 2015 2016
#1 awe 11 20
#2 hammer 4 10
#3 wrench 15 5
#$utensils
# group 2015 2016
#1 forks 11 10
#2 kinves 15 20
#3 spoons 4 5