使用大型数据框创建摘要数据框

时间:2017-09-17 16:53:57

标签: r dataframe

我有一个数据框,如;

Cost_Center_ID  Month   Year    Actal_Cost
829054          Nov     2015    2549.45
829056          Nov     2015    49.72
829057          Nov     2015    105241.09
829058          Nov     2015    212.23
829059          Nov     2015    -320306.99
829059          Oct     2012    650
829562          Oct     2011    6662
852564          Dec     2010    154
 .....          ...      ...     ....

我想创建一个像;

这样的汇总表

每个Cost_Center

的特定数据框
Cost_Center   Year    Jan      Feb ......          ..   Dec
852564        2015    225.56   526.55                   895464.8
852564        2016    6632.2   225.13
852564        2017    5512.22  ....

我正在使用R 3.4.1。

3 个答案:

答案 0 :(得分:3)

以下是使用dplyrtidyr软件包进行汇总然后重新整形数据框的方法。这种方法类似于reshape方法。了解更多相关信息的好地方是"Gathering and Spreading" section of R4 Data Science

加载包和数据:

library(dplyr)
library(tidyr)
data <- read.table(header = TRUE, text = "
           Cost_Center_ID  Month   Year    Actal_Cost
           829054          Nov     2015    2549.45
           829056          Nov     2015    49.72
           829057          Nov     2015    105241.09
           829058          Nov     2015    212.23
           829059          Nov     2015    -320306.99
           829059          Oct     2012    650
           829562          Oct     2011    6662
           852564          Dec     2010    154")

首先按小组总结数据:

data %>% group_by(Cost_Center_ID, Year, Month) %>% 
  summarise(total = sum(Actal_Cost)) 

  Cost_Center_ID  Year  Month      total
           <int> <int> <fctr>      <dbl>
1         829054  2015    Nov    2549.45
2         829056  2015    Nov      49.72
3         829057  2015    Nov  105241.09
4         829058  2015    Nov     212.23
5         829059  2012    Oct     650.00
6         829059  2015    Nov -320306.99
7         829562  2011    Oct    6662.00
8         852564  2010    Dec     154.00

然后可以使用tidyr命令“重新整形”:

data %>% group_by(Cost_Center_ID, Year, Month) %>% 
  summarise(total = sum(Actal_Cost)) %>% 
  spread(Month, total)

  Cost_Center_ID  Year   Dec        Nov   Oct
*          <int> <int> <dbl>      <dbl> <dbl>
1         829054  2015    NA    2549.45    NA
2         829056  2015    NA      49.72    NA
3         829057  2015    NA  105241.09    NA
4         829058  2015    NA     212.23    NA
5         829059  2012    NA         NA   650
6         829059  2015    NA -320306.99    NA
7         829562  2011    NA         NA  6662
8         852564  2010   154         NA    NA

答案 1 :(得分:1)

使用data.table::dcast的解决方案:

foo <- data.table(d)[, sum(Actal_Cost), .(Cost_Center_ID, Year, Month)]
dcast(foo, Cost_Center_ID + Year ~ Month, value.var = "V1")

   Cost_Center_ID Year Dec        Nov  Oct
1:         829054 2015  NA    2549.45   NA
2:         829056 2015  NA      49.72   NA
3:         829057 2015  NA  105241.09   NA
4:         829058 2015  NA     212.23   NA
5:         829059 2012  NA         NA  650
6:         829059 2015  NA -320306.99   NA
7:         829562 2011  NA         NA 6662
8:         852564 2010 154         NA   NA

答案 2 :(得分:0)

您是否尝试将数据投射到wide format?如果是这样,reshape2 packae有一个内置函数来执行此操作。

myDf <- read.table(header = TRUE, text = "
  Cost_Center_ID   Month   Year    Actal_Cost
  829054           Nov     2015    2549.45
  829056           Nov     2015    49.72
  829057           Nov     2015    105241.09
  829058           Nov     2015    212.23
  829059           Nov     2015    -320306.99
  829059           Oct     2012    650
  829562           Oct     2011    6662
  852564           Dec     2010    154")

library(reshape2)
dcast(myDf, Cost_Center_ID + Year ~ Month, 
      fun.aggregate = sum, value.var = "Actal_Cost")
#   Cost_Center_ID Year            Dec            Nov            Oct
# 1         829054 2015              0        2549.45              0
# 2         829056 2015              0          49.72              0
# 3         829057 2015              0      105241.09              0
# 4         829058 2015              0         212.23              0
# 5         829059 2012              0           0.00            650
# 6         829059 2015              0     -320306.99              0
# 7         829562 2011              0           0.00           6662
# 8         852564 2010            154           0.00              0

有关将数据从长到大,反之亦然的更多信息,请参见here