Question

问题是我想减少绘图和分析的数据量。我正在使用Python和Numpy。数据采样不均匀，因此存在一组时间戳和一组相应的值。我希望它在数据点之间至少有一定的时间。我有一个用Python编写的简单解决方案，其中找到的指标是样本之间至少有一秒的距离：

import numpy as np

t = np.array([0, 0.1, 0.2, 0.3, 1.0, 2.0, 4.0, 4.1, 4.3, 5.0 ]) # seconds
v = np.array([0, 0.0, 2.0, 2.0, 2.0, 4.0, 4.0, 5.0, 5.0, 5.0 ])

idx = [0]
last_t = t[0]
min_dif = 1.0 # Minimum distance between samples in time
for i in range(1, len(t)):
    if last_t + min_dif <= t[i]:
        last_t = t[i]
        idx.append(i)

如果我们看一下结果：

--> print idx
[0, 4, 5, 6, 9]

--> print t[idx]
[ 0.  1.  2.  4.  5.]

问题是如何更有效地完成这项工作，特别是如果数组真的很长？是否有一些内置的NumPy或SciPy方法可以做类似的事情？

Answer 1

虽然像@ 1443118一样，我建议使用pandas，但您可能想尝试使用np.histogram。

首先，了解您需要的垃圾箱数量（min_dif s的间隔）：

>>> bins = np.arange(t[0], t[-1]+min_dif, min_dif) - 1e-12

t[-1]+min_dif是为了确保我们采取最后一点，-1e-12一个黑客，以避免在最后一个bin中计算你的示例的4.0：它只是一个偏移量我们确保关闭右边的间隔。

>>> (counts, _) = np.histogram(t, bins)
>>> counts
array([4, 1, 1, 0, 3])
>>> counts.cumsum()
array([4, 5, 6, 6, 9])

所以，v[0:4]是你的第一个样本，v[4:5]是你的第二个......你明白了。

Answer 2

一种简单的解决方案是通过插值，使用例如numpy.interp：

vsampled = numpy.interp(numpy.arange(t[0], t[-1]), t, v)

这不会给你价值的指数。但是，即使对于输入数组中没有数据可用的t中的点，它也会通过插值生成值。

Answer 3

我无法想到一个解决方案完全你想要什么，但虽然它对我来说似乎不太合理，但这应该近似你想做什么而不做插值。每秒最多只能给出一个值（最左边）：

# Assuming that t is sorted...
# Create all full seconds.
seconds = np.arange(int(t[0]), int(t[-1]) + 1)

# find the indexes for all
idx = np.searchsorted(t, seconds)
idx = np.unique(idx) # there might be duplicates if a second has no data in it.

对于你的例子，它给出了相同的结果，但它通常会允许更小或更大的差异（0到2秒之间的任何事情）......

Answer 4

我建议您使用pandas。生成规则间隔的时间序列然后将数据重采样到某个特定频率非常简单。见this 并查看关于重新采样的小节，大约在页面的一半。

在NumPy中从不均匀采样的数据生成均匀采样的数组

4 个答案: