使用reshape2的列中的小计

时间:2012-09-16 08:54:54

标签: r plyr reshape2

我现在花了一些时间学习reshape2plyr,但我仍然没有得到它。这次我遇到了(a)小计和(b)传递不同聚合函数的问题。这是一个使用a tutorial

blog of mrdwab的数据的示例
# libraries 
library(plyr)
library(reshape2)
# get data and add few more variables 
book.sales = read.csv("http://news.mrdwab.com/data-booksales")
book.sales$Stock = book.sales$Quantity + 10
book.sales$SubjCat[(book.sales$Subject == 'Economics') | 
  (book.sales$Subject == 'Management')] <- '1_EconSciences'
book.sales$SubjCat[book.sales$Subject %in% 
  c('Anthropology', 'Politics', 'Sociology')] <- '2_SocSciences'
book.sales$SubjCat[book.sales$Subject %in% c('Communication', 'Fiction',
  'History', 'Research', 'Statistics')] <- '3_other'

# to get to my starting dataframe (close to the project I am working on) 
book.sales1 <- ddply(book.sales, c('Region', 'Representative', 'SubjCat', 
                                   'Subject', 'Publisher'), summarize,
                 Stock = sum(Stock), Sold = sum(Quantity),
                 Ratio = round((100 * sum(Quantity)/sum(Stock)), digits = 1))


#melt it 
m.book.sales = melt(data = book.sales1, id.vars = c('Region', 'Representative',
                                        'SubjCat', 'Subject', 'Publisher'),
                    measured.vars = c('Stock', 'Sold', 'Ratio'))

# cast it --- # Please ignore this cast this was a mistake 
# Tab1 <- dcast(data = m.book.sales, 
#         formula = Region + Representative ~ Publisher + variable,
#         fun.aggregate = sum, margins = c('Region', 'Representative'))

Tab1 <- dcast(data = m.book.sales, formula = Region + Representative ~ 
  SubjCat + Subject + variable, fun.aggregate = sum,
              margins = c('Region', 'Representative', 'SubjCat', 'Subject'))

现在我的问题:

  1. 我已经能够在行中添加小计。但是也可以在列中添加边距。比方说,一个发布者的股票总数?对不起,我的意思是说所有出版商的销售总数。

  2. “比率”列存在问题。如何为此变量获得“mean”而不是“sum”?

  3. 请注意:问题一(关于边距小计)可以解决。

    P.S。:我看过一些使用reshape的例子。您是否会建议使用它而不是reshape2(这似乎不包括两个函数的功能)。

1 个答案:

答案 0 :(得分:2)

不确定您对问题1的确切要求,但如果您想要Publisher的总库存,那么您不会这样做吗?

 totalofstock <- ddply(book.sales, ('Publisher'), function(x)   
                      data.frame=c(subtotals  =  sum(x$Stock)))

如果您想将其添加到Tab1,请执行以下操作:

Tab1$bloomsburytotalofstock<-totalofstock[1,][[2]]
head(Tab1)

对于问题2,获取mean而不是sum肯定会将功能从sum更改为mean

e.g。

ratiomeans <- ddply(book.sales1, ('Publisher'), function(x)   
                      data.frame=c(ratioMEAN  =  mean(x$Ratio)))

我还建议坚持使用reshape2reshape2基本上是reshape的新版本。据我所知reshape已不再使用但仍然存在,因此使用reshape的旧代码的人不必重写所有内容。

修改

justratio<-(m.book.sales[m.book.sales$variable=="Ratio",])
Tab2 <- dcast(data = justratio, 
        formula = Region + Representative ~ SubjCat + Subject + variable,
        fun.aggregate = mean,
        margins = c('Region', 'Representative', 'SubjCat', 'Subject'))
final<-merge(Tab1,Tab2,by=c("Region","Representative"))