Question

我有一个带有句点的数据集

active <- data.table(id=c(1,1,2,3), beg=as.POSIXct(c("2018-01-01 01:10:00","2018-01-01 01:50:00","2018-01-01 01:50:00","2018-01-01 01:50:00")), end=as.POSIXct(c("2018-01-01 01:20:00","2018-01-01 02:00:00","2018-01-01 02:00:00","2018-01-01 02:00:00")))
> active
   id                 beg                 end 
1:  1 2018-01-01 01:10:00 2018-01-01 01:20:00 
2:  1 2018-01-01 01:50:00 2018-01-01 02:00:00    
3:  2 2018-01-01 01:50:00 2018-01-01 02:00:00    
4:  3 2018-01-01 01:50:00 2018-01-01 02:00:00

在其ID有效期间。我想汇总ids，并确定

中的每个点

time <- data.table(seq(from=min(active$beg),to=max(active$end),by="mins"))

无效的ID数以及直到激活为止的平均分钟数。也就是说，理想情况下，表格看起来像

>ans
                   time  inactive av.time
 1: 2018-01-01 01:10:00         2      30
 2: 2018-01-01 01:11:00         2      29
...
50: 2018-01-01 02:00:00         0       0

我相信可以使用data.table来完成此操作，但是我无法弄清楚语法以获得时差。

Answer 1

使用dplyr，我们可以通过虚拟变量加入以创建time和active的笛卡尔积。 inactive和av.time的定义可能与您所寻找的不完全相同，但是它可以帮助您入门。如果您的数据非常大，我同意data.table将是处理此问题的更好方法。

library(tidyverse)

time %>% 
  mutate(dummy = TRUE) %>% 
  inner_join({
    active %>% 
      mutate(dummy = TRUE)
    #join by the dummy variable to get the Cartesian product
  }, by = c("dummy" = "dummy")) %>% 
  select(-dummy) %>% 
  #define what makes an id inactive and the time until it becomes active
  mutate(inactive = time < beg | time > end,
         TimeUntilActive = ifelse(beg > time, difftime(beg, time, units = "mins"), NA)) %>% 
  #group by time and summarise
  group_by(time) %>% 
  summarise(inactive = sum(inactive),
            av.time = mean(TimeUntilActive, na.rm = TRUE))

# A tibble: 51 x 3
        time            inactive av.time
        <dttm>            <int>   <dbl>
1 2018-01-01 01:10:00        3      40
2 2018-01-01 01:11:00        3      39
3 2018-01-01 01:12:00        3      38
4 2018-01-01 01:13:00        3      37
5 2018-01-01 01:14:00        3      36
6 2018-01-01 01:15:00        3      35
7 2018-01-01 01:16:00        3      34
8 2018-01-01 01:17:00        3      33
9 2018-01-01 01:18:00        3      32
10 2018-01-01 01:19:00        3      31

R：data.table：使用一段时间的引用进行汇总

1 个答案: