使用另一个

时间:2015-06-04 20:01:58

标签: python arrays numpy

看似简单的问题:我有一个包含两列的数组,第一列表示ID,第二列表示计数。我想用另一个类似的数组更新它

import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

a.update(b)  # ????
>>> np.array([[1, 2],
              [2, 4],
              [3, 2],
              [4, 5],
              [5, 3]])

有没有办法用索引/切片来做到这一点,这样我不必简单地遍历每一行?

2 个答案:

答案 0 :(得分:4)

通用案例

方法#1 :您可以使用np.add.at执行此类ID-based添加操作 -

# First column of output array as the union of first columns of a,b              
out_id = np.union1d(a[:,0],b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)

# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

要查找a_idxb_idx,可能是更快的替代方案,np.searchsorted可以像这样使用 -

a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')

示例输入输出:

In [538]: a
Out[538]: 
array([[1, 2],
       [4, 2],
       [3, 1],
       [5, 5]])

In [539]: b
Out[539]: 
array([[3, 7],
       [1, 1],
       [4, 0],
       [2, 3],
       [6, 2]])

In [540]: out
Out[540]: 
array([[1, 3],
       [2, 3],
       [3, 8],
       [4, 2],
       [5, 5],
       [6, 2]])

方法#2 :您可以使用np.bincount执行相同的ID添加 -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))

# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)

# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)

# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

具体案例

如果ab中的ID列已排序,则会变得更容易,因为我们可以使用带有np.in1d的掩码来索引使用{{1创建的输出ID数组像这样 -

np.union

示例运行 -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

答案 1 :(得分:3)

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [5, 3]])

注意如果您希望结果排序,可以使用np.lexsort

result[np.lexsort((result[:,0],result[:,0]))]

说明:

首先,您可以使用以下命令找到唯一ID:

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])

然后找出id a和所有id之间的差异:

>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])

然后使用b中的ID找到diff中的项目:

>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])

最后将结果与列表a连接起来:

>>> np.concatenate((a,val))

考虑另一个排序示例:

>>> a = np.array([[1, 2],
...               [2, 2],
...               [3, 1],
...               [7, 5]])
>>> 
>>> b = np.array([[2, 2],
...               [3, 1],
...               [4, 0],
...               [5, 3]])
>>> 
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]

>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [7, 5]])