前五年的总和

时间:2019-01-25 13:25:58

标签: r

我需要在每行中汇总 N_C 变量的前 5年

例如:2017年-Sum_Five_Years = 10(2017)+21(2015)+14(2014)+16(2013)= 61

数据:

library(dplyr) 
DF<-data.frame(company = c("DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM"),
           year= c("2017","2015","2015","2015","2013","2012","2012","2012","2010","2010","2015","2014","2014","2013","2013","2012"),
           N_C=   c("0","7","5","4","3","24","52","99","43","37","5","7","7","4","9","20"), Sum_Year = c("0","21","21","21","16","195","195","195","80","80","21","14","14","16","16","195"))
DF <- DF %>% arrange(year)

         company year N_C Sum_Year
1  DEL MAR PHARM 2010  43       80
2  DEL MAR PHARM 2010  37       80
3  DEL MAR PHARM 2012  24      195
4  DEL MAR PHARM 2012  52      195
5  DEL MAR PHARM 2012  99      195
6  DEL MAR PHARM 2012  20      195
7  DEL MAR PHARM 2013   3       16
8  DEL MAR PHARM 2013   4       16
9  DEL MAR PHARM 2013   9       16
10 DEL MAR PHARM 2014   7       14
11 DEL MAR PHARM 2014   7       14
12 DEL MAR PHARM 2015   7       21
13 DEL MAR PHARM 2015   5       21
14 DEL MAR PHARM 2015   4       21
15 DEL MAR PHARM 2015   5       21
16 DEL MAR PHARM 2017  10       10

预期结果

DF$Sum_Five_Year <- cbind(c("80","80","275","275","275","275","291","291","291","305","305","246","246","246","246","61"))

> DF
         company year N_C Sum_Year Sum_Five_Year
1  DEL MAR PHARM 2010  43       80            80
2  DEL MAR PHARM 2010  37       80            80
3  DEL MAR PHARM 2012  24      195           275
4  DEL MAR PHARM 2012  52      195           275
5  DEL MAR PHARM 2012  99      195           275
6  DEL MAR PHARM 2012  20      195           275
7  DEL MAR PHARM 2013   3       16           291
8  DEL MAR PHARM 2013   4       16           291
9  DEL MAR PHARM 2013   9       16           291
10 DEL MAR PHARM 2014   7       14           305
11 DEL MAR PHARM 2014   7       14           305
12 DEL MAR PHARM 2015   7       21           246
13 DEL MAR PHARM 2015   5       21           246
14 DEL MAR PHARM 2015   4       21           246
15 DEL MAR PHARM 2015   5       21           246
16 DEL MAR PHARM 2017  10       10            61

我尝试了以下代码,但是它不起作用:

 library(data.table)
 setDT(DF)  
 DF[, `:=` (Sum_Five_Year= sum(N_C)), by= list(company,cut(year, breaks = c(5), right = F))]

任何建议将不胜感激:)

1 个答案:

答案 0 :(得分:3)

没有其他软件包,您可以使用sapply

以下代码假定已经创建Sum_Year。您可以将以下内容直接应用于您的示例:

distinct(DF, company, year, Sum_Year) %>%
  group_by(company) %>%
  mutate(
    year = as.integer(as.character(year)),
    Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
  ) %>%
  left_join(DF %>% select(-Sum_Year), by = c("company", "year"))

输出:

# A tibble: 16 x 5
# Groups:   company [?]
   company      year Sum_Year Sum_Five_Year   N_C
   <chr>       <int>    <int>         <int> <int>
 1 DELMARPHARM  2010       80            80    43
 2 DELMARPHARM  2010       80            80    37
 3 DELMARPHARM  2012      195           275    24
 4 DELMARPHARM  2012      195           275    52
 5 DELMARPHARM  2012      195           275    99
 6 DELMARPHARM  2012      195           275    20
 7 DELMARPHARM  2013       16           291     3
 8 DELMARPHARM  2013       16           291     4
 9 DELMARPHARM  2013       16           291     9
10 DELMARPHARM  2014       14           305     7
11 DELMARPHARM  2014       14           305     7
12 DELMARPHARM  2015       21           246     7
13 DELMARPHARM  2015       21           246     5
14 DELMARPHARM  2015       21           246     4
15 DELMARPHARM  2015       21           246     5
16 DELMARPHARM  2017       10            61    10

否则,您可以这样做:

DF %>%
  group_by(company, year) %>%
  mutate(N_C = as.numeric(as.character(N_C))) %>%
  summarise(Sum_Year = sum(N_C)) %>%
  mutate(
    year = as.integer(as.character(year)),
    Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
    ) %>%
  left_join(DF %>% select(-Sum_Year), by = c("company", "year"))

如果您想摆脱重复的格式,只需在最后省略连接:

DF %>%
  group_by(company, year) %>%
  mutate(N_C = as.numeric(as.character(N_C))) %>%
  summarise(Sum_Year = sum(N_C)) %>%
  mutate(
    year = as.integer(as.character(year)),
    Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
    )

输出:

# A tibble: 6 x 4
# Groups:   company [1]
  company      year Sum_Year Sum_Five_Year
  <chr>       <int>    <dbl>         <dbl>
1 DELMARPHARM  2010       80            80
2 DELMARPHARM  2012      195           275
3 DELMARPHARM  2013       16           291
4 DELMARPHARM  2014       14           305
5 DELMARPHARM  2015       21           246
6 DELMARPHARM  2017       10            61