我有一个数据框,如;
Cost_Center_ID Month Year Actal_Cost
829054 Nov 2015 2549.45
829056 Nov 2015 49.72
829057 Nov 2015 105241.09
829058 Nov 2015 212.23
829059 Nov 2015 -320306.99
829059 Oct 2012 650
829562 Oct 2011 6662
852564 Dec 2010 154
..... ... ... ....
我想创建一个像;
这样的汇总表每个Cost_Center
Cost_Center Year Jan Feb ...... .. Dec
852564 2015 225.56 526.55 895464.8
852564 2016 6632.2 225.13
852564 2017 5512.22 ....
我正在使用R 3.4.1。
答案 0 :(得分:3)
以下是使用dplyr
和tidyr
软件包进行汇总然后重新整形数据框的方法。这种方法类似于reshape
方法。了解更多相关信息的好地方是"Gathering and Spreading" section of R4 Data Science
加载包和数据:
library(dplyr)
library(tidyr)
data <- read.table(header = TRUE, text = "
Cost_Center_ID Month Year Actal_Cost
829054 Nov 2015 2549.45
829056 Nov 2015 49.72
829057 Nov 2015 105241.09
829058 Nov 2015 212.23
829059 Nov 2015 -320306.99
829059 Oct 2012 650
829562 Oct 2011 6662
852564 Dec 2010 154")
首先按小组总结数据:
data %>% group_by(Cost_Center_ID, Year, Month) %>%
summarise(total = sum(Actal_Cost))
Cost_Center_ID Year Month total
<int> <int> <fctr> <dbl>
1 829054 2015 Nov 2549.45
2 829056 2015 Nov 49.72
3 829057 2015 Nov 105241.09
4 829058 2015 Nov 212.23
5 829059 2012 Oct 650.00
6 829059 2015 Nov -320306.99
7 829562 2011 Oct 6662.00
8 852564 2010 Dec 154.00
然后可以使用tidyr
命令“重新整形”:
data %>% group_by(Cost_Center_ID, Year, Month) %>%
summarise(total = sum(Actal_Cost)) %>%
spread(Month, total)
Cost_Center_ID Year Dec Nov Oct
* <int> <int> <dbl> <dbl> <dbl>
1 829054 2015 NA 2549.45 NA
2 829056 2015 NA 49.72 NA
3 829057 2015 NA 105241.09 NA
4 829058 2015 NA 212.23 NA
5 829059 2012 NA NA 650
6 829059 2015 NA -320306.99 NA
7 829562 2011 NA NA 6662
8 852564 2010 154 NA NA
答案 1 :(得分:1)
使用data.table::dcast
的解决方案:
foo <- data.table(d)[, sum(Actal_Cost), .(Cost_Center_ID, Year, Month)]
dcast(foo, Cost_Center_ID + Year ~ Month, value.var = "V1")
Cost_Center_ID Year Dec Nov Oct
1: 829054 2015 NA 2549.45 NA
2: 829056 2015 NA 49.72 NA
3: 829057 2015 NA 105241.09 NA
4: 829058 2015 NA 212.23 NA
5: 829059 2012 NA NA 650
6: 829059 2015 NA -320306.99 NA
7: 829562 2011 NA NA 6662
8: 852564 2010 154 NA NA
答案 2 :(得分:0)
您是否尝试将数据投射到wide format?如果是这样,reshape2
packae有一个内置函数来执行此操作。
myDf <- read.table(header = TRUE, text = "
Cost_Center_ID Month Year Actal_Cost
829054 Nov 2015 2549.45
829056 Nov 2015 49.72
829057 Nov 2015 105241.09
829058 Nov 2015 212.23
829059 Nov 2015 -320306.99
829059 Oct 2012 650
829562 Oct 2011 6662
852564 Dec 2010 154")
library(reshape2)
dcast(myDf, Cost_Center_ID + Year ~ Month,
fun.aggregate = sum, value.var = "Actal_Cost")
# Cost_Center_ID Year Dec Nov Oct
# 1 829054 2015 0 2549.45 0
# 2 829056 2015 0 49.72 0
# 3 829057 2015 0 105241.09 0
# 4 829058 2015 0 212.23 0
# 5 829059 2012 0 0.00 650
# 6 829059 2015 0 -320306.99 0
# 7 829562 2011 0 0.00 6662
# 8 852564 2010 154 0.00 0
有关将数据从长到大,反之亦然的更多信息,请参见here