我有一个时间序列df1
。 df2
表示基于start
的{{1}} stop
和difference
日期以及value
。除了日期之间的最终差异(如df1
所示),我想找到开始和结束之间的日常差异。
有没有人知道从哪里开始?我开始使用df2
中的merge
来合并R
和df1
,但之后我陷入困境。
df2
DF2
df1
Date Value
20120509 1018.53
20120510 1026.5
20120511 1032.41
20120514 1004.87
20120515 999.22
20120516 986.94
20120518 955.98
预计会出现:
structure(list(Start = c(20030127L, 20030128L, 20030129L, 20030205L,
20030210L, 20030228L, 20030307L, 20030310L, 20030313L, 20030331L,
20030402L, 20030513L, 20030519L, 20030520L, 20030521L, 20030625L,
20030701L, 20030919L, 20030922L, 20030923L, 20030925L, 20030930L,
20031112L, 20031120L, 20031128L, 20031217L, 20031218L, 20040130L,
20040205L, 20040225L, 20040316L, 20040322L, 20040323L, 20040430L,
20040504L, 20040506L, 20040507L, 20040510L, 20040512L, 20040517L,
20040621L, 20040622L, 20040708L, 20040709L, 20040712L, 20040719L,
20040720L, 20040727L, 20040811L, 20040812L, 20040816L, 20040928L,
20041015L, 20041021L, 20041025L, 20041125L, 20041210L, 20041220L,
20050121L, 20050124L), Stop = c(20030128L, 20030129L, 20030205L,
20030210L, 20030217L, 20030307L, 20030310L, 20030311L, 20030320L,
20030401L, 20030409L, 20030519L, 20030520L, 20030521L, 20030528L,
20030701L, 20030708L, 20030922L, 20030923L, 20030924L, 20030930L,
20031007L, 20031119L, 20031127L, 20031205L, 20031218L, 20031230L,
20040204L, 20040212L, 20040303L, 20040322L, 20040323L, 20040330L,
20040503L, 20040506L, 20040507L, 20040510L, 20040512L, 20040517L,
20040525L, 20040622L, 20040630L, 20040709L, 20040712L, 20040719L,
20040720L, 20040726L, 20040803L, 20040812L, 20040816L, 20040823L,
20041005L, 20041020L, 20041022L, 20041101L, 20041202L, 20041217L,
20041228L, 20050124L, 20050131L), Difference = c(-132, -204,
-455, -1640, 3678, -1516, -610, -247, 4280, -378, 1138, -1386,
-174, -247, 2003, -431, 2725, -149, -420, -580, -459, 2211, -578,
1100, 812, -76, 2191, -1009, 2041, 2462, -1109, -277, 1733, -189,
-815, -161, -694, -153, -141, 932, -473, 1961, -452, -368, -332,
-83, -737, 664, -465, -632, 2261, 3159, -432, -1000, 2456, 958,
-463, 419, -310, 1334)), .Names = c("Start", "Stop", "Difference"
), row.names = c(NA, 60L), class = "data.frame")
答案 0 :(得分:2)
你可以这样做。只要Date列按升序排列,它就会起作用。它通过检查df2$Start
中的日期来创建分组变量,然后为每个组创建差异的累积和,unlist
将它们放入单个向量中。
df1$Change <- unlist(tapply(df1$Value,
cumsum(df1$Date %in% df2$Start),
function(x) cumsum(c(0, diff(x)))))
df1
Date Value Change
1 20120509 1018.53 0.00
2 20120510 1026.50 7.97
3 20120511 1032.41 13.88
4 20120514 1004.87 -13.66
5 20120515 999.22 0.00
6 20120516 986.94 -12.28
7 20120518 955.98 -43.24
答案 1 :(得分:0)
使用dplyr lag函数计算一行与上一行之间的差异。 http://dplyr.tidyverse.org/reference/lead-lag.html
我没有测试过以下代码:
library(dplyr)
df1 %>%
mutate(ChangeOverTime = value - lag(value,1, default = 0.0))
但是有一些缺失的日期,例如20120512.你必须考虑你用它们做什么。