Question

我有两个数据集a[ts1]和b[ts2]，其中ts1和ts2是在不同时间（不同的基础？）中采用的时间戳。我想绘制b[ts2]-a[ts1]，但我认为我犯了一个错误，因为绘图软件理解我想要b[i]-a[i]，其中i是值的索引顺序。

所以我想用numpy做一个小例子，我意识到我不知道numpy是否以及如何执行此操作 - 但使用向量，并避免{{ 1}}循环。我已经做了一个示例（下面），将for和a[ts1]定义为b[ts2]结构化数组，标题为numpy和a_np：

b_np

所以我的问题是：

这类数组/问题叫什么？它只是“时间序列”阵列吗？基本上，它们描述了一维信号，但由于必须保留时间戳，因此它是一个二维阵列;由于“时间”列可以有任何意义，我想它可以推广到（值）列中数组值的插值，而不是（时间/索引）列中的“索引”值。
可以array([(0.0, 0.0), (0.8865606188774109, 0.30000001192092896), (1.6939274072647095, 0.6000000238418579), (2.3499808311462402, 0.8999999761581421)], ... dtype=[('a', '<f4'), ('ts1', '<f4')]) array([(0.3973386585712433, 0.10000000149011612), (0.7788366675376892, 0.20000000298023224), (1.4347121715545654, 0.4000000059604645), (1.6829419136047363, 0.5)], ... dtype=[('b', '<f4'), ('ts2', '<f4')])进行矢量化操作，在减去之前，数组在“时间”内正确插值吗？

寻找有关此信息，我找到了pandas: Python Data Analysis Library;我想我应该使用它，因为它具有“时间序列”功能 - 但在这种情况下，我不需要任何花哨的样本值插值 - 只是一个“步骤”或“保持”一个（基本上，没有插值）;这就是为什么如果numpy能够以矢量化的方式做到这一点，我就会徘徊。否则，以下示例使用numpy循环。

该示例将生成如下图像：

array-timeseries

数组for和a代表在不同时间获取的值，由各自的b表示; impulses标有a（因此，在绘图前进行线性插值），lines标有b（表示存在的实际值）

数组steps表示构造数组时所采用的“原始”差异d1 - 显然，我实际上无法访问这些数据，所以我必须使用采样值。在这种情况下，差异b[t]-a[t]显示为数组/信号b[ts2]-a[ts1]，再次显示为d2以强调针对“原始”的错误。这个steps是我想用d2计算的（但在下面，它是在同一个numpy循环中计算的。）

我使用我的绘图软件所犯的错误是获得for和b或a的索引差异;这显示为数组/信号b[i]-a[i] - 并且如图所示，它方式关闭它本来应该表示的内容。仅当两个信号中的采样间隔不均匀时才是这种情况;在代码中尝试e，然后modulowith = 2实际上并没有那么糟糕 - 但是，我的真实案例有不均匀的时间戳，所以e根本没有帮助我。

这是代码，它也调用b[i]-a[i]（在Python 2.7上测试，我认为是gnuplot 1.5）：

numpy

感谢@runnerup的回答，这里有一点点详细（出于语法示例目的）import subprocess import math, random import numpy as np from pprint import pprint from numpy.lib.recfunctions import append_fields step = 0.1 modulowith = 3 # must init all arrays separately; # a=b=[] makes a==b by reference! ts1 = []; ts2 = [] ; tsd = [] valsa = []; valsb = []; valsd1 = []; valsd2 = [] stra = strb = strd1 = strd2 = "" ; kval1 = kval2 = 0 for ix in range(0, 100, 1): ts = ix*step val1 = 3.0*math.sin(ts) #+random.random() val2 = 2.0*math.sin(2.0*ts) if ( ix%modulowith == 0): ts1.append(ts) ; valsa.append(val1) stra += "%.03f %.06f\n" % (ts, val1) kval1 = val1 else: ts2.append(ts) ; valsb.append(val2) strb += "%.03f %.06f\n" % (ts, val2) kval2 = val2 tsd.append(ts) valb = val2 - val1 ; valsd1.append(valb) strd1 += "%.03f %.06f\n" % (ts, valb) valc = kval2 - kval1 ; valsd2.append(valc) strd2 += "%.03f %.06f\n" % (ts, valc) a_np = np.array( [(_valsa,) for _valsa in valsa], dtype=[('a','f4')] ) b_np = np.array( [(_valsb,) for _valsb in valsb], dtype=[('b','f4')] ) a_np = append_fields(a_np, names='ts1', data=ts1, dtypes='f4', usemask=False) b_np = append_fields(b_np, names='ts2', data=ts2, dtypes='f4', usemask=False) pprint(a_np[:4]) pprint(b_np[:4]) # e_np = np.subtract(b_np['b'],a_np['a']) # (via field reference) is same as doing: # e_np = np.subtract(np.array(valsa, dtype="f4"), np.array(valsb, dtype="f4")) # but for different sized arrays, must do: e_np = b_np['b'] - np.resize(a_np, b_np.shape)['a'] pprint(e_np[:4]) e_str = "" for ts, ie in zip(ts2, e_np): e_str += "%.03f %.06f\n" % (ts, ie) gpscript = """ plot "-" using 1:2 with lines lc rgb "green" t"a", \\ "" using 1:2 with impulses lc rgb "green" t"", \\ "" using 1:2 with steps lc rgb "blue" t"b", \\ "" using 1:2 with impulses lc rgb "blue" t"", \\ "" using 1:2 with lines lc rgb "red" t"d1", \\ "" using 1:2 with steps lc rgb "orange" t"d2", \\ "" using 1:2 with steps lc rgb "brown" t"e" - {0} e {0} e {1} e {1} e {2} e {3} e {4} e """.format(stra, strb, strd1, strd2, e_str) proc = subprocess.Popen( ['gnuplot','--persist'], shell=False, stdin=subprocess.PIPE, ) proc.communicate(gpscript) - 只有解决方案：

numpy

这是线性插值的，因此它与上面的# create union of both timestamp arrays as tsz ntz = np.union1d(b_np['ts2'], a_np['ts1']) # interpolate `a` values over tsz a_z = np.interp(ntz, a_np['ts1'], a_np['a']) # interpolate `b` values over tsz b_z = np.interp(ntz, b_np['ts2'], b_np['b']) # create structured arrays for resampled `a` and `b`, # indexed against tsz timestamps a_npz = np.array( [ (tz,az) for tz,az in zip(ntz,a_z) ], dtype=[('tsz', 'f4'), ('a', 'f4')] ) b_npz = np.array( [ (tz,bz) for tz,bz in zip(ntz,b_z) ], dtype=[('tsz', 'f4'), ('b', 'f4')] ) # subtract resized array e_npz = np.subtract(b_npz['b'], a_npz['a']) e_str = "" # check: pprint(e_npz[:4]) # gnuplot string: for ts, ie in zip(ntz, e_npz): e_str += "%.03f %.06f\n" % (ts, ie)不同，但仍然非常合适。

如果没有创建数组的d2循环，那么它是矢量化的 - 原则上我甚至不需要创建那些数组 - 只是想看看它们看起来像结构化的那样总而言之，我想我希望有一个可以做到这一点的单线程，使用结构化数组（也就是处理字段名称）。

Answer 1

这是试图在切换到pandas时向您推销：）

import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt

# one minute interval
start = dt.datetime.now( )
end = start + dt.timedelta( minutes=1 )

# sin curve at seconds frequancy
idx1 = pd.date_range( start, end, freq='S' )
ts1 = pd.Series( np.sin( np.linspace( 0, 4 * np.pi, len( idx1 ) ) ), index=idx1 )

# cosine curve at milisecond frequency
idx2 = pd.date_range( start, end, freq='L' )
ts2 = pd.Series( np.cos( np.linspace( 0, 4 * np.pi, len( idx2 ) ) ), index=idx2 )

现在len( ts1 ) = 61和len( ts2 ) = 6001，频率不同

fig = plt.figure( figsize=(8, 6) )
ax = fig.add_axes( [.05, .05, .9, .9] )

ts1.plot( ax, color='DarkBlue' )
ts2.plot( ax, color='DarkRed' )

# reindex ts2 like ts1
ts2 = ts2.reindex_like( ts1 )
(ts1 - ts2).plot( ax, color='DarkGreen' )

你得到：

time series

编辑：出于插值目的，你可以在statsmodels中使用非参数方法，所以基本上你可以在另一个系列的频率上插入一个系列，然后减去这两个：

import statsmodels.api as sm
n = 1000
x = np.linspace( 0, 1, n )
y = np.random.randn( n ).cumsum( )
z = sm.nonparametric.lowess( y, x, return_sorted=False, frac=.05)

ax.plot( x, y, 'Blue', linestyle='--' )
ax.plot( x, z, color='DarkRed' )

用numpy减去两个交错的，不同的时间序列数组？

1 个答案: