在python中使用numpy在一列中标识具有相同值的向量

时间:2015-05-12 11:26:34

标签: python arrays numpy indexing

我有一个大的2d向量数组。我想根据矢量的元素或维度之一将这个数组拆分成几个数组。如果此列中的值连续相同,我希望收到一个这样的小数组。例如,考虑第三个维度或列:

orig = np.array([[1, 2, 3], 
                 [3, 4, 3], 
                 [5, 6, 4], 
                 [7, 8, 4], 
                 [9, 0, 4], 
                 [8, 7, 3], 
                 [6, 5, 3]])

我想变成三个由1,2和3,4,5和6,7行组成的数组:

>>> a
array([[1, 2, 3],
       [3, 4, 3]])

>>> b
array([[5, 6, 4],
       [7, 8, 4],
       [9, 0, 4]])

>>> c
array([[8, 7, 3],
       [6, 5, 3]])

我是python和numpy的新手。任何帮助将不胜感激。

此致 垫

编辑:我重新格式化了数组以澄清问题

3 个答案:

答案 0 :(得分:7)

使用np.split

>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)

>>> a
array([[1, 2, 3],
       [1, 2, 3]])
>>> b
array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]])
>>> c
array([[1, 2, 3],
       [1, 2, 3]])

答案 1 :(得分:0)

这里没什么好看的,但是这个老式的循环应该可以解决问题

import numpy as np

a = np.array([[1, 2, 3], 
              [1, 2, 3], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 3], 
              [1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
    if row[-1] == prev:
        rows = np.vstack((rows, row))
    else:
        groups.append(rows)
        rows = [row]
    prev = row[-1]
groups.append(rows)

print groups

## [array([[1, 2, 3],
##         [1, 2, 3]]),
##  array([[1, 2, 4],
##         [1, 2, 4],
##         [1, 2, 4]]),
##  array([[1, 2, 3],
##         [1, 2, 3]])]

答案 2 :(得分:0)

如果a看起来像这样:

array([[1, 1, 2, 3],
       [2, 1, 2, 3],
       [3, 1, 2, 4],
       [4, 1, 2, 4],
       [5, 1, 2, 4],
       [6, 1, 2, 3],
       [7, 1, 2, 3]])

比这个

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)

结果:

[array([[1, 2, 3],
       [1, 2, 3]]), array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]]), array([[1, 2, 3],
       [1, 2, 3]])]

更新:np.split()更好。无需添加第一个和最后一个索引:

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)