Question

我需要解析两个对象的时间相关位置，并在numpy数组中获取数据：

data = [[0, 1, 2],
        [1, 4, 3],
        [2, 2, 1]]

以便第一列表示位置，第二列表示时间点A在该特定位置，第二列表示B点在该位置的最后一列时间。保证数据是一致的，即如果任何两行具有相同的时间 - 它们在伪代码中具有相同的位置：

data[row1,1] == data[row2,1]  <=>  data[row1,0] == data[row2,0]
data[row1,2] == data[row2,2]  <=>  data[row1,0] == data[row2,0]

我希望拥有的是以某种方式重铸此数组，以便枚举所有可用的次并根据位置，例如：

parsed = [[1, 0, 2],
          [2, 2, 0],
          [3, np.nan, 1],
          [4, 1, np.nan]]

这里，第一列是时间，第二列是点A的位置，第三列是点B的位置。当我没有关于点位置的信息时，应该分配np.nan。我目前所做的是将数据阵列分成两个独立的数组：

    moments = set (data [:, 1:3].flatten())

    for each in moments:
        a = data[:,[1,0]][pos[:,1] == each]
        b = data[:,[2,0]][pos[:,2] == each]

然后我重新合并了John Galt's answer here中的内容。这在某种程度上起作用，但我真的希望可能有更好的解决方案。任何人都可以踢我正确的方向吗？

Answer 1

这是使用NumPy数组初始化和赋值的一种方法 -

# Gather a and b indices. Get their union, that represents all posssible indices
a_idx = data[:,1]
b_idx = data[:,2]
all_idx = np.union1d(a_idx, b_idx)

# Setup o/p array 
out = np.full((all_idx.size,3),np.nan)

# Assign all indices to first col
out[:,0] = all_idx

# Determine the positions of a indices in all indices and assign first col data
out[np.searchsorted(all_idx, a_idx),1] = data[:,0]
# Similarly for b
out[np.searchsorted(all_idx, b_idx),2] = data[:,0]

np.searchsorted在这里充当了天赐之物，因为它为我们提供了我们需要在已经排序的数组{a中b和data放置的位置all_idx 1}}并且已知非常有效。

给定样本数据的输出 -

In [104]: out
Out[104]: 
array([[  1.,   0.,   2.],
       [  2.,   2.,   0.],
       [  3.,  nan,   1.],
       [  4.,   1.,  nan]])

Answer 2

由于缺乏更好的想法，让我投入一只熊猫单线。免责声明：它比Divakar的纯Numpy解决方案慢100倍：

df = pd.DataFrame(data)
pd.concat([df.set_index(ix)[0] for ix in [1,2]], axis=1).reset_index().values
#array([[  1.,   0.,   2.],
#       [  2.,   2.,   0.],
#       [  3.,  nan,   1.],
#       [  4.,   1.,  nan]])

重新安排numpy数组

2 个答案: