比较两个时间序列

时间:2017-04-26 17:47:44

标签: python algorithm sorting time-series

有两个时间A和B的数组,我们可以定义"匹配"以不同的方式:

  • 在时间段中划分时间范围,并检查适合每个箱子的A和B时间。

  • 设定A(或B)的符合标准,如+/- 1小时。然后在A中的每一次,在B中获得符合巧合标准的时间。

到目前为止,我已经使用了第一个解决方案。我经历的步骤是:

对于从t0到tf以及给定时间setp的时间范围,我会 有(tf-t0)/ time_step二进制数(由于我不记得最后一个bin超过tf而被四舍五入),首先我得到A中适合每个bin的时间索引:

span = (tf-t0)/time_step
tmin = t0
tempA = [[] for i in range(span)]
for i in range(span):
    tmax = tmin+time_step
    times = A[(A>=tmin) & (A<tmax)]
    times_ID=np.nonzero(np.in1d(A,times))[0]
    tempA[i] = [j for j in times_ID]
    tmin = tmax

然后我得到B中适合每个箱子的时间指数,我们知道有时候来自A:

tmin = t0
tempB = [[] for i in range(span)]
for i in range(span):
    tmax=tmin+time_step
    if len(tempA[i])>0:
        times = B[(B>=tmin) & (B<tmax)]
        if len(times)>0:
            times_ID = np.nonzero(np.in1d(B,times))[0]
            tempB[i] = [j for j in times_ID]
    tmin=tmax

最后对于两个临时列表,我摆脱了空箱。而且我也摆脱了另一个列表中相应分档为空的分档:

tpA=[i for i in tempA if i!=[] and tempB[tempA.index(i)]!=[]] # Matching indices for data in A
tpB=[i for i in tempB if i!=[] and tempA[tempB.index(i)]!=[]] # Matching indices for data in B

填充时间段的循环可能非常耗时。我不确定如何提高效率。 有更聪明的方法吗?

修改

这是一个正在运行的例子:

import numpy as np
from math import ceil
import time
import sys

def progress(i,tot,bar_length=20):
    percent=(float(i)+1.0)/tot
    hashes='#' * int(round(percent*bar_length))
    spaces=' ' * (bar_length - len(hashes))
    sys.stdout.write("\rPercent:[{0}] {1}%".format(hashes + spaces, int(round(percent * 100)))+"    "+str(i+1)+"/"+str(tot))
    sys.stdout.flush()

t0 = time.time()

tf = t0 + 3600*24*365*10 # ten years

time_step = 2*3600.0 # 2 hours

span = int(ceil((tf-t0)/time_step))

A=[t0]
while A[-1]<tf:
    A.append(t0+len(A)*time_step/3) # 40 min time interval
B=[t0]
while B[-1]<tf:
    B.append(t0+len(B)*time_step/10) # 12 min time interval

A=np.array(A[:int(0.75*len(A))]) # shorten the list of A times
B=np.array(B[:int(len(B)/4)]+B[int(len(B)/4)+10800:]) # put a 3 month gap somewhere in the B list

tmin = t0
tempA = [[] for i in range(span)]
print 'Fill tempA bins with indices of times from A'
for i in range(span):
    progress(i,span)
    tmax = tmin+time_step
    times = A[(A>=tmin) & (A<tmax)]
    times_ID=np.nonzero(np.in1d(A,times))[0]
    tempA[i] = [j for j in times_ID]
    tmin = tmax
print '\nA has',len([i for i in tempA if i!=[]]),'intervals of',time_step,'seconds with data between t0 and tf\n'

tmin = t0
tempB = [[] for i in range(span)]
print '\nFill tempB bins with indices of times from B, only when the corresponding bin in tempA is not empty'
for i in range(span):
    progress(i,span)
    tmax=tmin+time_step
    if len(tempA[i])>0:
        times = B[(B>=tmin) & (B<tmax)]
        if len(times)>0:
            times_ID = np.nonzero(np.in1d(B,times))[0]
            tempB[i] = [j for j in times_ID]
        tmin=tmax
print '\nB has',len([i for i in tempB if i!=[]]),'intervals of',time_step,'seconds with data matching A times\n'

print '\nOnly select matching bins from both lists'
tpA=[i for i in tempA if i!=[] and tempB[tempA.index(i)]!=[]] # Matching indices for data in A
tpB=[i for i in tempB if i!=[] and tempA[tempB.index(i)]!=[]] # Matching indices for data in B

0 个答案:

没有答案