R中的Pivot-Like Table

时间:2016-10-18 22:05:46

标签: r dplyr

我在dcast中使用dplyr创建了一个类似于表的数据透视表。只是好奇是否有更好的方法来做到这一点。

library(dplyr)
library(reshape2)

state.x77 <- as.data.frame(state.x77)

state.x77$Population_bucket <- ifelse(state.x77$Population >=
10000,'Large',ifelse(state.x77$Population >= 1000,'Medium',"Small"))
state.x77$Income_bucket <- ifelse(state.x77$Income >=
4700,'High',ifelse(state.x77$Income >= 4100,'Medium',"Low"))

dcast(state.x77 %>% 
group_by(Income_bucket, Population_bucket) %>% 
summarise(sum(Area)),
Income_bucket ~ Population_bucket)

1 个答案:

答案 0 :(得分:0)

有许多方法可以在R中创建这样的表,dcast是一个很好的方法。在编码方面,您可以使用cut函数代替ifelse来创建分组列,并且可以在dplyr链中完成分组。此外,可能更清楚的是在链的末尾调用dcast,而不是在dcast内包含链。例如:

labs = c("Small","Medium","Large")

state.x77 %>% 
  group_by(Population_bucket = cut(Population, breaks=c(0, 1000, 10000, Inf), 
                                   labels= labs, right=FALSE),
           Income_bucket = cut(Income, breaks=c(0, 4100, 4700, Inf), 
                               labels=labs, right=FALSE)) %>%
  summarise(sum(Area)) %>%
  dcast(Income_bucket ~ Population_bucket) 
  Income_bucket  Small Medium  Large
1         Small   9267 740233     NA
2        Medium 411498 662657 348075
3         Large 754001 351123 259940

如果要创建要由人类读取的输出表,您还可以使用千位分隔符格式化值。在这种情况下,summarise行将更改为以下内容:

summarise(format(sum(Area), big.mark=",")) %>%

输出结果为:

  Income_bucket   Small  Medium   Large
1         Small   9,267 740,233    <NA>
2        Medium 411,498 662,657 348,075
3         Large 754,001 351,123 259,940