如何从numpy.ndarray数据中排除行/列

时间:2014-01-09 14:11:09

标签: python numpy

假设我们有一个numpy.ndarray数据,比如形状(100,200),你还有一个你想要从数据中排除的索引列表。你会怎么做?像这样:

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[-indices,:] # imaginary code, what to replace here?

感谢。

4 个答案:

答案 0 :(得分:11)

您可以使用b = numpy.delete(a, indices, axis=0)

来源:NumPy docs

答案 1 :(得分:4)

你可以尝试:

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[np.setdiff1d(np.arange(100),indices),:]

这可以避免创建与https://stackoverflow.com/a/21022753/865169中的数据大小相同的mask数组。请注意,此示例在后一个答案中创建了一个2D数组b而不是展平数组。

这种方法与https://stackoverflow.com/a/30273446/865169的运行时间与内存成本的粗略调查似乎表明delete更快,而使用setdiff1d进行索引更容易消耗内存:

In [75]: %timeit b = np.delete(a, indices, axis=0)
The slowest run took 7.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.7 µs per loop

In [76]: %timeit c = a[np.setdiff1d(np.arange(100),indices),:]
10000 loops, best of 3: 48.4 µs per loop

In [77]: %memit b = np.delete(a, indices, axis=0)
peak memory: 52.27 MiB, increment: 0.85 MiB

In [78]: %memit c = a[np.setdiff1d(np.arange(100),indices),:]
peak memory: 52.39 MiB, increment: 0.12 MiB

答案 2 :(得分:3)

这很丑,但有效:

b = np.array([a[i] for i in range(m.shape[0]) if i not in indices])

答案 3 :(得分:1)

您可以尝试这样的事情:

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
mask = numpy.ones(a.shape, dtype=bool)
mask[indices,:] = False
b = a[mask]