Question

A和B组之间共有10个项目，每个项目的开始和结束日期都不同。对于给定时段内的每一天，都需要计算outputX和outputY的总和。我设法对所有项目都这样做，但是如何按组划分结果呢？

我已经使用lapply（）和purrr：map（）进行了几次尝试，也查看了过滤器和拆分，但无济于事。下面是一个不区分组的示例。

library(tidyverse)
library(lubridate)

df <- data.frame(
  project = 1:10,
  group = c("A","B"),
  outputX = rnorm(2),
  outputY = rnorm(5),
  start_date = sample(seq(as.Date('2018-01-3'), as.Date('2018-1-13'), by="day"), 10),
  end_date = sample(seq(as.Date('2018-01-13'), as.Date('2018-01-31'), by="day"), 10))
df$interval <- interval(df$start_date, df$end_date)

period <- data.frame(date = seq(as.Date("2018-01-08"), as.Date("2018-01-17"), by = 1))

df_sum <- do.call(rbind, lapply(period$date, function(x){
  index <- x %within% df$interval;
  list("X" = sum(df$outputX[index]),
       "Y" = sum(df$outputY[index]))}))

outcome <- cbind(period, df_sum) %>% gather("id", "value", 2:3)

outcome

最终它应该是40x4的表格。一些建议非常感谢！

Answer 1

如果我对您的理解正确，则需要使用内部联接。因此可以建议我们使用sqldf。参见https://stackoverflow.com/a/11895368/9300556

利用您的数据，我们可以做到这样。无需计算df$interval，但我们需要在ID上加上period，否则sqldf无效。

df <- data.frame(
  project = 1:10,
  group = c("A","B"),
  outputX = rnorm(2),
  outputY = rnorm(5),
  start = sample(seq(as.Date('2018-01-3'), as.Date('2018-1-13'), by="day"), 10),
  end = sample(seq(as.Date('2018-01-13'), as.Date('2018-01-31'), by="day"), 10))
# df$interval <- interval(df$start_date, df$end_date)

period <- data.frame(date = seq(as.Date("2018-01-08"), as.Date("2018-01-17"), by = 1)) %>% 
  mutate(id = 1:nrow(.))

然后我们可以使用sqldf

sqldf::sqldf("select * from period inner join df 
              on (period.date > df.start and period.date <= df.end) ") %>% 
  as_tibble() %>% 
  group_by(date, group) %>% 
  summarise(X = sum(outputX),
            Y = sum(outputY)) %>% 
  gather(id, value, -group, -date)
# A tibble: 40 x 4
# Groups:   date [10]
   date       group id    value
   <date>     <fct> <chr> <dbl>
 1 2018-01-08 A     X      3.04
 2 2018-01-08 B     X      2.34
 3 2018-01-09 A     X      3.04
 4 2018-01-09 B     X      3.51
 5 2018-01-10 A     X      3.04
 6 2018-01-10 B     X      4.68
 7 2018-01-11 A     X      4.05
 8 2018-01-11 B     X      4.68
 9 2018-01-12 A     X      4.05
10 2018-01-12 B     X      5.84
# ... with 30 more rows

在间隔内以％润滑的总和值

1 个答案: