计算时间序列中的日常变化

时间:2017-10-14 08:26:02

标签: python r pandas merge

我有一个时间序列df1df2表示基于start的{​​{1}} stopdifference日期以及value。除了日期之间的最终差异(如df1所示),我想找到开始和结束之间的日常差异。 有没有人知道从哪里开始?我开始使用df2中的merge来合并Rdf1,但之后我陷入困境。

df2

DF2

df1

Date        Value   
20120509    1018.53 
20120510    1026.5  
20120511    1032.41 
20120514    1004.87 
20120515    999.22  
20120516    986.94  
20120518    955.98  

预计会出现:

structure(list(Start = c(20030127L, 20030128L, 20030129L, 20030205L, 
20030210L, 20030228L, 20030307L, 20030310L, 20030313L, 20030331L, 
20030402L, 20030513L, 20030519L, 20030520L, 20030521L, 20030625L, 
20030701L, 20030919L, 20030922L, 20030923L, 20030925L, 20030930L, 
20031112L, 20031120L, 20031128L, 20031217L, 20031218L, 20040130L, 
20040205L, 20040225L, 20040316L, 20040322L, 20040323L, 20040430L, 
20040504L, 20040506L, 20040507L, 20040510L, 20040512L, 20040517L, 
20040621L, 20040622L, 20040708L, 20040709L, 20040712L, 20040719L, 
20040720L, 20040727L, 20040811L, 20040812L, 20040816L, 20040928L, 
20041015L, 20041021L, 20041025L, 20041125L, 20041210L, 20041220L, 
20050121L, 20050124L), Stop = c(20030128L, 20030129L, 20030205L, 
20030210L, 20030217L, 20030307L, 20030310L, 20030311L, 20030320L, 
20030401L, 20030409L, 20030519L, 20030520L, 20030521L, 20030528L, 
20030701L, 20030708L, 20030922L, 20030923L, 20030924L, 20030930L, 
20031007L, 20031119L, 20031127L, 20031205L, 20031218L, 20031230L, 
20040204L, 20040212L, 20040303L, 20040322L, 20040323L, 20040330L, 
20040503L, 20040506L, 20040507L, 20040510L, 20040512L, 20040517L, 
20040525L, 20040622L, 20040630L, 20040709L, 20040712L, 20040719L, 
20040720L, 20040726L, 20040803L, 20040812L, 20040816L, 20040823L, 
20041005L, 20041020L, 20041022L, 20041101L, 20041202L, 20041217L, 
20041228L, 20050124L, 20050131L), Difference = c(-132, -204, 
-455, -1640, 3678, -1516, -610, -247, 4280, -378, 1138, -1386, 
-174, -247, 2003, -431, 2725, -149, -420, -580, -459, 2211, -578, 
1100, 812, -76, 2191, -1009, 2041, 2462, -1109, -277, 1733, -189, 
-815, -161, -694, -153, -141, 932, -473, 1961, -452, -368, -332, 
-83, -737, 664, -465, -632, 2261, 3159, -432, -1000, 2456, 958, 
-463, 419, -310, 1334)), .Names = c("Start", "Stop", "Difference"
), row.names = c(NA, 60L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

你可以这样做。只要Date列按升序排列,它就会起作用。它通过检查df2$Start中的日期来创建分组变量,然后为每个组创建差异的累积和,unlist将它们放入单个向量中。

df1$Change <- unlist(tapply(df1$Value, 
                            cumsum(df1$Date %in% df2$Start), 
                            function(x) cumsum(c(0, diff(x)))))

df1
      Date   Value Change
1 20120509 1018.53   0.00
2 20120510 1026.50   7.97
3 20120511 1032.41  13.88
4 20120514 1004.87 -13.66
5 20120515  999.22   0.00
6 20120516  986.94 -12.28
7 20120518  955.98 -43.24

答案 1 :(得分:0)

使用dplyr lag函数计算一行与上一行之间的差异。 http://dplyr.tidyverse.org/reference/lead-lag.html

我没有测试过以下代码:

library(dplyr)

df1 %>%
    mutate(ChangeOverTime = value - lag(value,1, default = 0.0))

但是有一些缺失的日期,例如20120512.你必须考虑你用它们做什么。