我有一个数据集,其字段包含isic(国际标准行业分类),日期和现金。我想先按部门将其分组,然后再按会计年度获得总和。
#Here's a look at the data(cpt1). All the dates follow the following format "%Y-%m-01"
Cash Date isic
1 373165 2014-06-01 K
2 373165 2014-12-01 K
3 373165 2017-09-01 K
4 NA <NA> K
5 4789 2015-05-01 K
6 982121 2013-07-01 K
.
.
.
#I was able to group to group them by sector and sum them
cpt_by_sector=cpt1 %>% mutate(sector=recode_factor(isic,
'A'='Agriculture','B'='Industry','C'='Industry','D'='Industry',
'E'='Industry','F'='Industry',.default = 'Services',
.missing = 'Services')) %>%
group_by(sector) %>% summarise_if(is.numeric, sum, na.rm=T)
#here's the result
sector `Cash`
<fct> <dbl>
1 Agriculture 2094393819.
2 Industry 53699068183.
3 Services 223995196357.
#Below is what I would like to get. I would like to take into account the fiscal year i.e. from july to june.
Sector `2009/10` `2010/11` `2011/12` `2012/13` `2013/14` `2014/15` `2015/16` `2016/17`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Agriculture 2.02 3.62 3.65 6.26 7.04 8.36 11.7 11.6
2 Industry 87.8 117. 170. 163. 185. 211. 240. 252.
3 Services 271. 343. 479. 495. 584. 664. 738. 821.
4 Total 361. 464. 653. 664. 776. 883. 990. 1085.
PS:我将日期列更改为日期格式
答案 0 :(得分:1)
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
# FY is the year of the date, plus 1 if the month is July or later.
# FY_label makes the requested format, by combining the prior year,
# a slash, and digits 3&4 of the FY.
mutate(FY = year(Date) + if_else(month(Date) >= 7, 1, 0),
FY_label = paste0(FY-1, "/", substr(FY, 3, 4))) %>%
mutate(sector = recode_factor(isic,
'A'='Agriculture','B'='Industry','C'='Industry','D'='Industry',
'E'='Industry','F'='Industry', 'K'='Mystery Sector')) %>%
filter(!is.na(FY)) %>% # Exclude rows with missing FY
group_by(FY_label, sector) %>%
summarise(Cash = sum(Cash)) %>%
spread(FY_label, Cash)
# A tibble: 1 x 4
sector `2013/14` `2014/15` `2017/18`
<fct> <int> <int> <int>
1 Mystery Sector 1355286 377954 373165