我有一个包含两个因子列的表,我希望将其聚合到一个易于热图映射的表中。
此表格具有以下格式
City Date Revenue Costs Manager
____ ____ _______ ______ ___
New York Feb 1 2000 200 Stuart
San Fran Feb 3 1200 300 John
Boston Feb 1 1500 400 Mike
Boston Feb 1 1300 200 Cissy
等等
我希望按收入
获得此格式的二维汇总表格Sum Revenue New York San Fran Boston
____ ____ ____ ____
Feb 1 2000 0 2800
Feb 2 0 0 0
Feb 3 0 1200 0
有没有一种简单的方法可以做到这一点,还是我坚持使用循环?
答案 0 :(得分:3)
正如@Arun在评论中建议的那样,reshape
会为你做这件事。
d<-read.table(text='City Date Revenue Costs
"New York" "Feb 1" 2000 200
"San Fran" "Feb 3" 1200 300
Boston "Feb 1" 1500 400', header=TRUE)
reshape(d[! names(d) %in% 'Costs'], idvar='Date', timevar='City', direction='wide')
# Date Revenue.New York Revenue.San Fran Revenue.Boston
# 1 Feb 1 2000 NA 1500
# 2 Feb 3 NA 1200 NA
如果您想要首先合并城市/日期的多个条目,则可以使用aggregate
。
d<-read.table(text='City Date Revenue Costs
"New York" "Feb 1" 2000 200
"New York" "Feb 1" 1000 100
"San Fran" "Feb 3" 1200 300
Boston "Feb 1" 1500 400', header=TRUE)
dd<-with(d, aggregate(Revenue, by=list(City=City, Date=Date), sum))
# City Date x
# 1 Boston Feb 1 1500
# 2 New York Feb 1 3000
# 3 San Fran Feb 3 1200
ddd<-reshape(dd, idvar='Date', timevar='City', direction='wide')
# Date x.Boston x.New York x.San Fran
# 1 Feb 1 1500 3000 NA
# 3 Feb 3 NA NA 1200
然后将NA
替换为0
。
ddd[is.na(ddd)] <- 0
# Date x.Boston x.New York x.San Fran
# 1 Feb 1 1500 3000 0
# 3 Feb 3 0 0 1200
要解决@Arun在下面提到的问题,在上一步之前,您可以使用merge
函数填写缺少的日期。
missing.Dates <- c('Feb 2')
ddd<-merge(ddd, data.frame(Date=missing.Dates), by='Date', all=TRUE)
# Date x.Boston x.New York x.San Fran
#1 Feb 1 1500 3000 NA
#2 Feb 3 NA NA 1200
#3 Feb 2 NA NA NA
ddd[is.na(ddd)] <- 0
# Date x.Boston x.New York x.San Fran
# 1 Feb 1 1500 3000 0
# 2 Feb 3 0 0 1200
# 3 Feb 2 0 0 0