NA时群内总和

时间:2017-06-29 09:18:16

标签: r

我的数据框有3列(ID,日期,天)。 X列是我想要的。如果有NA,我想将上个月的天数与当月的天数相加。用dplyr可以做到吗?我尝试用for循环来做,但是因为我有超过5M行

需要太多时间
  ID     date        days    X
  A     2014-01-31     NA   NA
  A     2014-02-28     NA   NA
  A     2014-03-31      4    4
  A     2014-04-30     NA   34
  A     2014-05-31     NA   65
  A     2014-06-30     NA   95
  A     2014-07-31     NA  126  
  B     2014-01-31     NA   NA
  B     2014-02-28     11   11
  B     2014-03-31      6    6
  B     2014-04-30     NA   36
  B     2014-05-31      6    6
  B     2014-06-30     NA   36
  C     2015-01-31     NA   NA
  C     2015-02-28     NA   NA

1 个答案:

答案 0 :(得分:2)

以下是使用tidyverse

的尝试
library(tidyverse)

df %>% 
 mutate(date = as.Date(date, format = '%Y-%m-%d')) %>% 
 group_by(ID) %>% 
 mutate(new = cumsum(!is.na(days))+1) %>% 
 group_by(ID, new) %>% 
 mutate(new1 = cumsum(ifelse(is.na(days), as.numeric(diff.difftime(date)), days)), 
        new1 = replace(new1, new == 1, NA)) %>% 
 ungroup() %>% 
 select(-new)

# A tibble: 15 x 5
#       ID       date  days     X  new1
#   <fctr>     <date> <int> <int> <dbl>
# 1      A 2014-01-31    NA    NA    NA
# 2      A 2014-02-28    NA    NA    NA
# 3      A 2014-03-31     4     4     4
# 4      A 2014-04-30    NA    34    35
# 5      A 2014-05-31    NA    65    65
# 6      A 2014-06-30    NA    95    96
# 7      A 2014-07-31    NA   126   126
# 8      B 2014-01-31    NA    NA    NA
# 9      B 2014-02-28    11    11    11
#10      B 2014-03-31     6     6     6
#11      B 2014-04-30    NA    36    36
#12      B 2014-05-31     6     6     6
#13      B 2014-06-30    NA    36    36
#14      C 2015-01-31    NA    NA    NA
#15      C 2015-02-28    NA    NA    NA