Question

以下函数用于查找数组的唯一行：

def unique_rows(a):
    b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
    _, idx = np.unique(b, return_index=True)
    unique_a = a[idx]
    return unique_a

例如，

test = np.array([[1,0,1],[1,1,1],[1,0,1]])
unique_rows(test)
[[1,0,1],[1,1,1]]

我相信这个功能应该一直有效，但它可能不是水密的。在我的代码中，我想计算一组粒子存在多少个唯一位置。颗粒以2d阵列存储，每行对应于粒子的位置。这些头寸的类型为np.float64。我还定义了以下函数

def pos_tag(pos):
    x,y,z = pos[:,0],pos[:,1],pos[:,2]
    return (2**x)*(3**y)*(5**z)

原则上，此函数应为任何（x，y，z）位置生成唯一值。

但是，当我使用这些函数来计算我的粒子集中的唯一位置数时，它们会产生不同的答案。这是由于第一个函数中存在一些可能的逻辑缺陷，还是第二个函数没有为每个给定位置产生唯一值？

编辑：用法示例

我有一些长代码可以生成一个二维粒子位置数组。

partpos.shape = (6039539,3)

然后我按如下方式计算唯一行数

len(unqiue_rows(partpos))
6034411

和

posids = pos_tag(partpos)
len(np.unique(posids))
5328871

Answer 1

我认为由于精度错误而产生差异。使用代码

print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos)))

6034411
6034411

然而

print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos.astype(np.float32))))

6034411
5328871

Answer 2

a = [[1,0,1],[1,1,1],[1,0,1]]

# Convert rows to tuples so they're hashable, creating a generator thereof
b = (tuple(row) for row in a)

# Convert back to list of lists, after coercing to a set to eliminate non-unique rows
unique_rows = list(list(row) for row in set(b))

编辑：那令人尴尬。我刚才意识到我并没有真正解决问题。这可能仍然是OP正在寻找的答案，所以我会离开它，但事实并非如此。对不起。

在numpy数组

2 个答案: