基于条件的动态总和

时间:2019-05-16 07:15:21

标签: r

df <-data.frame(
  part = c('A','B','c'),
  start_date = c('2018-12-01','2018-12-06','2018-12-08'),
  end_date = c('2018-12-05','2018-12-07','2018-12-11'),
  X2018.12.01 = c(2,3,4),
  X2018.12.02 = c(5,6,0),
  X2018.12.03 = c(0,3,0),
  X2018.12.04 = c(5,9,1),
  X2018.12.05 = c(1,2,3),
  X2018.12.06 = c(2,3,4),
  X2018.12.07 = c(1,1,1),
  X2018.12.08 = c(6,6,6),
  X2018.12.09 = c(8,7,6),
  X2018.12.10 = c(0,1,1),
  X2018.12.11 = c(1,2,3))

df1 <- setNames(df, c("part","start_date","end_date","2018-12-01",
"2018-12-02","2018-12-03","2018-12-04","2018-12-05","2018-12-06","2018- 
12-07","2018-12-08","2018-12-09","2018-12-10","2018-12-11"))

现在我要在df1中创建一列,该列将根据其开始日期和结束日期对各个部分进行求和

  • A部分:从2018年12月1日至2018年12月5日的总和为2,5,0,5,1,即13,
  • B部分,其总和应为2018年12月6日至2018年12月7日,即3 + 1 = 4
  • c部分,应从2018-12-08到2018-12-11总计为6 + 6 + 1 + 3 = 16

我希望我能解释我的问题。

2 个答案:

答案 0 :(得分:0)

library(tidyverse)
df1 %>%
  gather(date, value, -c(part:end_date)) %>% 
  mutate_at(vars(start_date:date), lubridate::ymd) %>%
  filter(date >= start_date,
         date <= end_date) %>%
  count(part, wt = value)

## A tibble: 3 x 2
#  part      n
#  <fct> <dbl>
#1 A        13
#2 B         4
#3 c        16

首先,我将各列收集为长格式,然后转换为日期,过滤以仅包括每个部分指定范围内的值,最后求和。

答案 1 :(得分:0)

这是基本的R方法,

sapply(split(df1, seq(nrow(df1))), function(i) 
               rowSums(i[names(i) %in% as.character(seq.Date(as.Date(i$start_date[1]), 
                                                                     i$end_date[1], 
                                                                      by = 'days'))]))
#1.1 2.2 3.3 
#13   4  16