R中的二维聚合? (创建热图)

时间:2013-03-22 13:44:58

标签: r aggregate

我有一个包含两个因子列的表,我希望将其聚合到一个易于热图映射的表中。

此表格具有以下格式

 City         Date           Revenue     Costs       Manager 
 ____         ____            _______    ______       ___
 New York     Feb 1           2000        200        Stuart
 San Fran     Feb 3           1200        300        John
 Boston       Feb 1           1500        400        Mike
 Boston       Feb 1           1300        200        Cissy

等等

我希望按收入

获得此格式的二维汇总表格
Sum Revenue  New York     San Fran     Boston   
 ____         ____           ____       ____
 Feb 1        2000             0        2800
 Feb 2          0              0          0
 Feb 3          0             1200        0  

有没有一种简单的方法可以做到这一点,还是我坚持使用循环?

1 个答案:

答案 0 :(得分:3)

正如@Arun在评论中建议的那样,reshape会为你做这件事。

d<-read.table(text='City         Date           Revenue     Costs
"New York"     "Feb 1"           2000        200
"San Fran"     "Feb 3"           1200        300
Boston       "Feb 1"           1500        400', header=TRUE)
reshape(d[! names(d) %in% 'Costs'], idvar='Date', timevar='City', direction='wide')
#    Date Revenue.New York Revenue.San Fran Revenue.Boston
# 1 Feb 1             2000               NA           1500
# 2 Feb 3               NA             1200             NA

如果您想要首先合并城市/日期的多个条目,则可以使用aggregate

d<-read.table(text='City         Date           Revenue     Costs
"New York"     "Feb 1"           2000        200
"New York"     "Feb 1"           1000        100
"San Fran"     "Feb 3"           1200        300
Boston       "Feb 1"           1500        400', header=TRUE)
dd<-with(d, aggregate(Revenue, by=list(City=City, Date=Date), sum))
#     City     Date  x
# 1   Boston   Feb 1 1500
# 2 New York   Feb 1 3000
# 3 San Fran   Feb 3 1200
ddd<-reshape(dd, idvar='Date', timevar='City', direction='wide')
#    Date x.Boston x.New York x.San Fran
# 1 Feb 1     1500       3000         NA
# 3 Feb 3       NA         NA       1200

然后将NA替换为0

ddd[is.na(ddd)] <- 0
#    Date x.Boston x.New York x.San Fran
# 1 Feb 1     1500       3000          0
# 3 Feb 3        0          0       1200

要解决@Arun在下面提到的问题,在上一步之前,您可以使用merge函数填写缺少的日期。

missing.Dates <- c('Feb 2')
ddd<-merge(ddd, data.frame(Date=missing.Dates), by='Date', all=TRUE)
#   Date x.Boston x.New York x.San Fran
#1 Feb 1     1500       3000         NA
#2 Feb 3       NA         NA       1200
#3 Feb 2       NA         NA         NA
ddd[is.na(ddd)] <- 0
#    Date x.Boston x.New York x.San Fran
# 1 Feb 1     1500       3000          0
# 2 Feb 3        0          0       1200
# 3 Feb 2        0          0          0