我需要交互绘制一些大型阵列(约1亿个点)。我目前正在使用Matplotlib。按原样绘制阵列会非常缓慢,而且很浪费,因为无论如何您都无法看到那么多点。
因此,我做了一个最小/最大抽取函数,并将其绑定到轴的“ xlim_changed”回调中。我采用最小/最大方法,因为数据包含快速峰值,而我不想单步执行数据就错过这些峰值。有更多的包装器可以裁剪到x限制,并在某些条件下跳过处理,但相关部分如下:
def min_max_downsample(x,y,num_bins):
""" Break the data into num_bins and returns min/max for each bin"""
pts_per_bin = x.size // num_bins
#Create temp to hold the reshaped & slightly cropped y
y_temp = y[:num_bins*pts_per_bin].reshape((num_bins, pts_per_bin))
y_out = np.empty((num_bins,2))
#Take the min/max by rows.
y_out[:,0] = y_temp.max(axis=1)
y_out[:,1] = y_temp.min(axis=1)
y_out = y_out.ravel()
#This duplicates the x-value for each min/max y-pair
x_out = np.empty((num_bins,2))
x_out[:] = x[:num_bins*pts_per_bin:pts_per_bin,np.newaxis]
x_out = x_out.ravel()
return x_out, y_out
这工作得很好并且足够快(在1e8点和2k仓位上约为80ms)。定期重新计算和更新行的x和y数据的延迟很小。
但是,我唯一的抱怨是在x数据中。此代码复制每个bin左边缘的x值,并且不返回y个最小/最大对的真实x位置。我通常将箱数设置为使轴像素宽度加倍。所以您真的看不到区别,因为垃圾箱太小了……但我知道它在那里……这让我很烦。
因此,尝试编号2的确会为每个最小/最大对返回实际的x值。但是,它慢了大约5倍。
def min_max_downsample_v2(x,y,num_bins):
pts_per_bin = x.size // num_bins
#Create temp to hold the reshaped & slightly cropped y
y_temp = y[:num_bins*pts_per_bin].reshape((num_bins, pts_per_bin))
#use argmax/min to get column locations
cc_max = y_temp.argmax(axis=1)
cc_min = y_temp.argmin(axis=1)
rr = np.arange(0,num_bins)
#compute the flat index to where these are
flat_max = cc_max + rr*pts_per_bin
flat_min = cc_min + rr*pts_per_bin
#Create a boolean mask of these locations
mm_mask = np.full((x.size,), False)
mm_mask[flat_max] = True
mm_mask[flat_min] = True
x_out = x[mm_mask]
y_out = y[mm_mask]
return x_out, y_out
这在我的机器上花费了大约400+毫秒,这非常明显。所以我的问题是,基本上有没有办法更快并提供相同的结果?瓶颈主要在numpy.argmin
和numpy.argmax
函数中,比numpy.min
和numpy.max
慢很多。
答案可能是只使用版本1,因为它在视觉上并不重要。或尝试加快速度,例如cython(我从未使用过)。
FYI在Windows上使用Python 3.6.4 ...示例用法如下:
x_big = np.linspace(0,10,100000000)
y_big = np.cos(x_big )
x_small, y_small = min_max_downsample(x_big ,y_big ,2000) #Fast but not exactly correct.
x_small, y_small = min_max_downsample_v2(x_big ,y_big ,2000) #correct but not exactly fast.
答案 0 :(得分:3)
我设法通过直接使用arg(min|max)
的输出来索引数据数组来提高性能。这是以额外调用np.sort
为代价的,但是要排序的轴只有两个元素(最小/最大索引),并且整个数组很小(仓数):
def min_max_downsample_v3(x, y, num_bins):
pts_per_bin = x.size // num_bins
x_view = x[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
y_view = y[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
i_min = np.argmin(y_view, axis=1)
i_max = np.argmax(y_view, axis=1)
r_index = np.repeat(np.arange(num_bins), 2)
c_index = np.sort(np.stack((i_min, i_max), axis=1)).ravel()
return x_view[r_index, c_index], y_view[r_index, c_index]
我检查了您的示例的时间,并获得了
min_max_downsample_v1
:110毫秒±5毫秒min_max_downsample_v2
:240毫秒±8.01毫秒min_max_downsample_v3
:164毫秒±1.23毫秒我还检查了在调用arg(min|max)
之后是否直接返回,结果平均为164毫秒,也就是说,此后没有任何实际开销。
答案 1 :(得分:2)
因此,这没有解决加快所讨论的特定功能的问题,但是确实显示了一些有效地绘制具有大量点的线的方法。 假设x点是有序且均匀(或接近均匀)采样的。
设置
from pylab import *
这是我喜欢的功能,它通过在每个间隔中随机选择一个来减少点数。 不能保证显示数据的每个峰值,但是它没有直接抽取数据那样多的问题,而且速度很快。
def calc_rand(y, factor):
split = y[:len(y)//factor*factor].reshape(-1, factor)
idx = randint(0, split.shape[-1], split.shape[0])
return split[arange(split.shape[0]), idx]
这是查看信号包络的最小值和最大值
def calc_env(y, factor):
"""
y : 1D signal
factor : amount to reduce y by (actually returns twice this for min and max)
Calculate envelope (interleaved min and max points) for y
"""
split = y[:len(y)//factor*factor].reshape(-1, factor)
upper = split.max(axis=-1)
lower = split.min(axis=-1)
return c_[upper, lower].flatten()
以下函数可以采用其中任何一种,并使用它们减少正在绘制的数据。 默认情况下,实际获取的点数为5000,应该远远超过监视器的分辨率。 数据减少后将被缓存。 内存可能是个问题,尤其是在有大量数据的情况下,但它不应超过原始信号所需的数量。
def plot_bigly(x, y, *, ax=None, M=5000, red=calc_env, **kwargs):
"""
x : the x data
y : the y data
ax : axis to plot on
M : The maximum number of line points to display at any given time
kwargs : passed to line
"""
assert x.shape == y.shape, "x and y data must have same shape!"
if ax is None:
ax = gca()
cached = {}
# Setup line to be drawn beforehand, note this doesn't increment line properties so
# style needs to be passed in explicitly
line = plt.Line2D([],[], **kwargs)
def update(xmin, xmax):
"""
Update line data
precomputes and caches entire line at each level, so initial
display may be slow but panning and zooming should speed up after that
"""
# Find nearest power of two as a factor to downsample by
imin = max(np.searchsorted(x, xmin)-1, 0)
imax = min(np.searchsorted(x, xmax) + 1, y.shape[0])
L = imax - imin + 1
factor = max(2**int(round(np.log(L/M) / np.log(2))), 1)
# only calculate reduction if it hasn't been cached, do reduction using nearest cached version if possible
if factor not in cached:
cached[factor] = red(y, factor=factor)
## Make sure lengths match correctly here, by ensuring at least
# "factor" points for each x point, then matching y length
# this assumes x has uniform sample spacing - but could be modified
newx = x[imin:imin + ((imax-imin)//factor)* factor:factor]
start = imin//factor
newy = cached[factor][start:start + newx.shape[-1]]
assert newx.shape == newy.shape, "decimation error {}/{}!".format(newx.shape, newy.shape)
## Update line data
line.set_xdata(newx)
line.set_ydata(newy)
update(x[0], x[-1])
ax.add_line(line)
## Manually update limits of axis, as adding line doesn't do this
# if drawing multiple lines this can quickly slow things down, and some
# sort of check should be included to prevent unnecessary changes in limits
# when a line is first drawn.
ax.set_xlim(min(ax.get_xlim()[0], x[0]), max(ax.get_xlim()[1], x[1]))
ax.set_ylim(min(ax.get_ylim()[0], np.min(y)), max(ax.get_ylim()[1], np.max(y)))
def callback(*ignore):
lims = ax.get_xlim()
update(*lims)
ax.callbacks.connect('xlim_changed', callback)
return [line]
这是一些测试代码
L=int(100e6)
x=linspace(0,1,L)
y=0.1*randn(L)+sin(2*pi*18*x)
plot_bigly(x,y, red=calc_env)
在我的机器上,这显示非常快。缩放有一点滞后,尤其是当缩放量很大时。平移没有问题。使用随机选择而不是最小值和最大值要快得多,并且只有在非常高的缩放级别时才会出现问题。
答案 2 :(得分:2)
编辑:向numba添加了parallel = True ...甚至更快
我最终完成了单次通过argmin + max例程与@a_guest的答案以及指向this related simultaneous min max question的链接的改进索引的混合工作。
此版本为每个最小/最大y对返回正确的x值,并且由于numba
实际上比“快速但不太正确”的版本要快一点。
from numba import jit, prange
@jit(parallel=True)
def min_max_downsample_v4(x, y, num_bins):
pts_per_bin = x.size // num_bins
x_view = x[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
y_view = y[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
i_min = np.zeros(num_bins,dtype='int64')
i_max = np.zeros(num_bins,dtype='int64')
for r in prange(num_bins):
min_val = y_view[r,0]
max_val = y_view[r,0]
for c in range(pts_per_bin):
if y_view[r,c] < min_val:
min_val = y_view[r,c]
i_min[r] = c
elif y_view[r,c] > max_val:
max_val = y_view[r,c]
i_max[r] = c
r_index = np.repeat(np.arange(num_bins), 2)
c_index = np.sort(np.stack((i_min, i_max), axis=1)).ravel()
return x_view[r_index, c_index], y_view[r_index, c_index]
使用timeit
比较速度可以看到numba
代码快了2.6倍,并提供了比v1更好的结果。它比连续执行numpy的argmin和argmax快10倍以上。
%timeit min_max_downsample_v1(x_big ,y_big ,2000)
96 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit min_max_downsample_v2(x_big ,y_big ,2000)
507 ms ± 4.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit min_max_downsample_v3(x_big ,y_big ,2000)
365 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit min_max_downsample_v4(x_big ,y_big ,2000)
36.2 ms ± 487 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
答案 3 :(得分:0)
您是否尝试过pyqtgraph进行交互式绘图?它比matplotlib响应更快。
我用于下采样的一个技巧是使用array_split并计算分割的最小值和最大值。根据绘图区域每个像素的样本数进行拆分。