我试图对时间滞后为1的数字求和。即我想通过添加特定组中日期相差一天的值的频率来总结行。我使用滞后函数来获取差异,但不知道如何从这里开始。
df <- df %>%
group_by(group) %>%
mutate(diff = dt - lag(dt))
df[!is.na(df$diff) & df$diff > 1,]$diff <- NA
例如:
group dt freq diff
groupA 2016-03-21 1 NA
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA
groupA 2016-03-28 1 NA
groupA 2016-03-29 3 1
groupA 2016-03-30 3 1
groupA 2016-03-31 5 1
groupB 2016-04-01 1 NA
groupB 2016-04-02 2 1
我需要将其分组为:
group dt freq diff duration
groupA 2016-03-21 1 NA 3 (1 + 1 + 1)
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA 2
groupA 2016-03-28 1 NA 12(1 + 3 + 3 + 5)
groupA 2016-03-29 3 1
groupA 2016-03-30 3 1
groupA 2016-03-31 5 1
groupB 2016-04-01 1 NA 3(1 + 2)
groupB 2016-04-02 2 1
也称为this,但累积不起作用,因为我不认为跳跃超过一天。循环在自定义函数中的唯一方法是:(?
感谢。
答案 0 :(得分:0)
以下是使用tidyverse
的{{1}}解决方案:
dplyr::lead
说明:library(tidyverse);
df %>%
mutate(dt = as.POSIXct(dt)) %>%
group_by(group) %>%
mutate(
diff = pmin(c(1, diff(dt)), c(1, diff(lead(dt))), na.rm = T),
id = cumsum(c(TRUE, diff(diff) != 0) | diff > 1)) %>%
group_by(group, id) %>%
mutate(duration = sum(freq)) %>%
ungroup() %>%
select(-diff, -id)
## A tibble: 10 x 4
# group dt freq duration
# <fct> <dttm> <int> <int>
# 1 groupA 2016-03-21 00:00:00 1 3
# 2 groupA 2016-03-22 00:00:00 1 3
# 3 groupA 2016-03-23 00:00:00 1 3
# 4 groupA 2016-03-26 00:00:00 2 2
# 5 groupA 2016-03-28 00:00:00 1 12
# 6 groupA 2016-03-29 00:00:00 3 12
# 7 groupA 2016-03-30 00:00:00 3 12
# 8 groupA 2016-03-31 00:00:00 5 12
# 9 groupB 2016-04-01 00:00:00 1 3
#10 groupB 2016-04-02 00:00:00 2 3
选择前一个和后一个日期之间的最小差异。然后,我们在diff
中查找更改,并创建一个新的分组向量diff
,我们通过该向量计算汇总度量标准id
。
sum(freq)
对于你的第二个例子:
df <- read.table(text =
" group dt freq diff
groupA 2016-03-21 1 NA
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA
groupA 2016-03-28 1 NA
groupA 2016-03-29 3 1
groupA 2016-03-30 3 1
groupA 2016-03-31 5 1
groupB 2016-04-01 1 NA
groupB 2016-04-02 2 1 ", header = T)
答案 1 :(得分:0)
使用此方法可以更轻松地完成此操作(将行与{1}} 1天差异分组);这将创建一个帮助列less.than
,稍后将用于对同一组中连续几天的gap
求和:
freq