我在dcast中使用dplyr创建了一个类似于表的数据透视表。只是好奇是否有更好的方法来做到这一点。
library(dplyr)
library(reshape2)
state.x77 <- as.data.frame(state.x77)
state.x77$Population_bucket <- ifelse(state.x77$Population >=
10000,'Large',ifelse(state.x77$Population >= 1000,'Medium',"Small"))
state.x77$Income_bucket <- ifelse(state.x77$Income >=
4700,'High',ifelse(state.x77$Income >= 4100,'Medium',"Low"))
dcast(state.x77 %>%
group_by(Income_bucket, Population_bucket) %>%
summarise(sum(Area)),
Income_bucket ~ Population_bucket)
答案 0 :(得分:0)
有许多方法可以在R中创建这样的表,dcast
是一个很好的方法。在编码方面,您可以使用cut
函数代替ifelse
来创建分组列,并且可以在dplyr
链中完成分组。此外,可能更清楚的是在链的末尾调用dcast
,而不是在dcast
内包含链。例如:
labs = c("Small","Medium","Large")
state.x77 %>%
group_by(Population_bucket = cut(Population, breaks=c(0, 1000, 10000, Inf),
labels= labs, right=FALSE),
Income_bucket = cut(Income, breaks=c(0, 4100, 4700, Inf),
labels=labs, right=FALSE)) %>%
summarise(sum(Area)) %>%
dcast(Income_bucket ~ Population_bucket)
Income_bucket Small Medium Large 1 Small 9267 740233 NA 2 Medium 411498 662657 348075 3 Large 754001 351123 259940
如果要创建要由人类读取的输出表,您还可以使用千位分隔符格式化值。在这种情况下,summarise
行将更改为以下内容:
summarise(format(sum(Area), big.mark=",")) %>%
输出结果为:
Income_bucket Small Medium Large 1 Small 9,267 740,233 <NA> 2 Medium 411,498 662,657 348,075 3 Large 754,001 351,123 259,940