R中的时间序列摘要

时间:2018-04-10 00:07:30

标签: r

我试图对时间滞后为1的数字求和。即我想通过添加特定组中日期相差一天的值的频率来总结行。我使用滞后函数来获取差异,但不知道如何从这里开始。

df <- df %>% 
  group_by(group) %>% 
  mutate(diff = dt - lag(dt))

df[!is.na(df$diff) & df$diff > 1,]$diff <- NA

例如:

 group     dt           freq  diff  
 groupA    2016-03-21    1     NA    
 groupA    2016-03-22    1     1     
 groupA    2016-03-23    1     1     
 groupA    2016-03-26    2     NA     
 groupA    2016-03-28    1     NA     
 groupA    2016-03-29    3     1     
 groupA    2016-03-30    3     1     
 groupA    2016-03-31    5     1     
 groupB    2016-04-01    1     NA      
 groupB    2016-04-02    2     1 

我需要将其分组为:

group    dt         freq  diff  duration     
groupA  2016-03-21    1     NA    3 (1 + 1 + 1)     
groupA  2016-03-22    1     1         
groupA  2016-03-23    1     1         
groupA  2016-03-26    2     NA    2     
groupA  2016-03-28    1     NA    12(1 + 3 + 3 + 5)     
groupA  2016-03-29    3     1         
groupA  2016-03-30    3     1         
groupA  2016-03-31    5     1         
groupB  2016-04-01    1     NA    3(1 + 2)     
groupB  2016-04-02    2     1 

也称为this,但累积不起作用,因为我不认为跳跃超过一天。循环在自定义函数中的唯一方法是:(?

感谢。

2 个答案:

答案 0 :(得分:0)

以下是使用tidyverse的{​​{1}}解决方案:

dplyr::lead

说明:library(tidyverse); df %>% mutate(dt = as.POSIXct(dt)) %>% group_by(group) %>% mutate( diff = pmin(c(1, diff(dt)), c(1, diff(lead(dt))), na.rm = T), id = cumsum(c(TRUE, diff(diff) != 0) | diff > 1)) %>% group_by(group, id) %>% mutate(duration = sum(freq)) %>% ungroup() %>% select(-diff, -id) ## A tibble: 10 x 4 # group dt freq duration # <fct> <dttm> <int> <int> # 1 groupA 2016-03-21 00:00:00 1 3 # 2 groupA 2016-03-22 00:00:00 1 3 # 3 groupA 2016-03-23 00:00:00 1 3 # 4 groupA 2016-03-26 00:00:00 2 2 # 5 groupA 2016-03-28 00:00:00 1 12 # 6 groupA 2016-03-29 00:00:00 3 12 # 7 groupA 2016-03-30 00:00:00 3 12 # 8 groupA 2016-03-31 00:00:00 5 12 # 9 groupB 2016-04-01 00:00:00 1 3 #10 groupB 2016-04-02 00:00:00 2 3 选择前一个和后一个日期之间的最小差异。然后,我们在diff中查找更改,并创建一个新的分组向量diff,我们通过该向量计算汇总度量标准id

样本数据

sum(freq)

更新

对于你的第二个例子:

df <- read.table(text =
    " group     dt           freq  diff
 groupA    2016-03-21    1     NA
 groupA    2016-03-22    1     1
 groupA    2016-03-23    1     1
 groupA    2016-03-26    2     NA
 groupA    2016-03-28    1     NA
 groupA    2016-03-29    3     1
 groupA    2016-03-30    3     1
 groupA    2016-03-31    5     1
 groupB    2016-04-01    1     NA
 groupB    2016-04-02    2     1 ", header = T)

答案 1 :(得分:0)

使用此方法可以更轻松地完成此操作(将行与{1}} 1天差异分组);这将创建一个帮助列less.than,稍后将用于对同一组中连续几天的gap求和:

freq