有两个时间A和B的数组,我们可以定义"匹配"以不同的方式:
在时间段中划分时间范围,并检查适合每个箱子的A和B时间。
设定A(或B)的符合标准,如+/- 1小时。然后在A中的每一次,在B中获得符合巧合标准的时间。
到目前为止,我已经使用了第一个解决方案。我经历的步骤是:
对于从t0到tf以及给定时间setp的时间范围,我会 有(tf-t0)/ time_step二进制数(由于我不记得最后一个bin超过tf而被四舍五入),首先我得到A中适合每个bin的时间索引:
span = (tf-t0)/time_step
tmin = t0
tempA = [[] for i in range(span)]
for i in range(span):
tmax = tmin+time_step
times = A[(A>=tmin) & (A<tmax)]
times_ID=np.nonzero(np.in1d(A,times))[0]
tempA[i] = [j for j in times_ID]
tmin = tmax
然后我得到B中适合每个箱子的时间指数,我们知道有时候来自A:
tmin = t0
tempB = [[] for i in range(span)]
for i in range(span):
tmax=tmin+time_step
if len(tempA[i])>0:
times = B[(B>=tmin) & (B<tmax)]
if len(times)>0:
times_ID = np.nonzero(np.in1d(B,times))[0]
tempB[i] = [j for j in times_ID]
tmin=tmax
最后对于两个临时列表,我摆脱了空箱。而且我也摆脱了另一个列表中相应分档为空的分档:
tpA=[i for i in tempA if i!=[] and tempB[tempA.index(i)]!=[]] # Matching indices for data in A
tpB=[i for i in tempB if i!=[] and tempA[tempB.index(i)]!=[]] # Matching indices for data in B
填充时间段的循环可能非常耗时。我不确定如何提高效率。 有更聪明的方法吗?
修改
这是一个正在运行的例子:
import numpy as np
from math import ceil
import time
import sys
def progress(i,tot,bar_length=20):
percent=(float(i)+1.0)/tot
hashes='#' * int(round(percent*bar_length))
spaces=' ' * (bar_length - len(hashes))
sys.stdout.write("\rPercent:[{0}] {1}%".format(hashes + spaces, int(round(percent * 100)))+" "+str(i+1)+"/"+str(tot))
sys.stdout.flush()
t0 = time.time()
tf = t0 + 3600*24*365*10 # ten years
time_step = 2*3600.0 # 2 hours
span = int(ceil((tf-t0)/time_step))
A=[t0]
while A[-1]<tf:
A.append(t0+len(A)*time_step/3) # 40 min time interval
B=[t0]
while B[-1]<tf:
B.append(t0+len(B)*time_step/10) # 12 min time interval
A=np.array(A[:int(0.75*len(A))]) # shorten the list of A times
B=np.array(B[:int(len(B)/4)]+B[int(len(B)/4)+10800:]) # put a 3 month gap somewhere in the B list
tmin = t0
tempA = [[] for i in range(span)]
print 'Fill tempA bins with indices of times from A'
for i in range(span):
progress(i,span)
tmax = tmin+time_step
times = A[(A>=tmin) & (A<tmax)]
times_ID=np.nonzero(np.in1d(A,times))[0]
tempA[i] = [j for j in times_ID]
tmin = tmax
print '\nA has',len([i for i in tempA if i!=[]]),'intervals of',time_step,'seconds with data between t0 and tf\n'
tmin = t0
tempB = [[] for i in range(span)]
print '\nFill tempB bins with indices of times from B, only when the corresponding bin in tempA is not empty'
for i in range(span):
progress(i,span)
tmax=tmin+time_step
if len(tempA[i])>0:
times = B[(B>=tmin) & (B<tmax)]
if len(times)>0:
times_ID = np.nonzero(np.in1d(B,times))[0]
tempB[i] = [j for j in times_ID]
tmin=tmax
print '\nB has',len([i for i in tempB if i!=[]]),'intervals of',time_step,'seconds with data matching A times\n'
print '\nOnly select matching bins from both lists'
tpA=[i for i in tempA if i!=[] and tempB[tempA.index(i)]!=[]] # Matching indices for data in A
tpB=[i for i in tempB if i!=[] and tempA[tempB.index(i)]!=[]] # Matching indices for data in B