看似简单的问题:我有一个包含两列的数组,第一列表示ID,第二列表示计数。我想用另一个类似的数组更新它
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
a.update(b) # ????
>>> np.array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3]])
有没有办法用索引/切片来做到这一点,这样我不必简单地遍历每一行?
答案 0 :(得分:4)
方法#1 :您可以使用np.add.at
执行此类ID-based
添加操作 -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
要查找a_idx
和b_idx
,可能是更快的替代方案,np.searchsorted
可以像这样使用 -
a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')
示例输入输出:
In [538]: a
Out[538]:
array([[1, 2],
[4, 2],
[3, 1],
[5, 5]])
In [539]: b
Out[539]:
array([[3, 7],
[1, 1],
[4, 0],
[2, 3],
[6, 2]])
In [540]: out
Out[540]:
array([[1, 3],
[2, 3],
[3, 8],
[4, 2],
[5, 5],
[6, 2]])
方法#2 :您可以使用np.bincount
执行相同的ID添加 -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))
# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)
# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)
# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
如果a
和b
中的ID列已排序,则会变得更容易,因为我们可以使用带有np.in1d
的掩码来索引使用{{1创建的输出ID数组像这样 -
np.union
示例运行 -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
答案 1 :(得分:3)
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[5, 3]])
注意如果您希望结果排序,可以使用np.lexsort
:
result[np.lexsort((result[:,0],result[:,0]))]
说明:
首先,您可以使用以下命令找到唯一ID:
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])
然后找出id a
和所有id之间的差异:
>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])
然后使用b
中的ID找到diff
中的项目:
>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])
最后将结果与列表a
连接起来:
>>> np.concatenate((a,val))
考虑另一个排序示例:
>>> a = np.array([[1, 2],
... [2, 2],
... [3, 1],
... [7, 5]])
>>>
>>> b = np.array([[2, 2],
... [3, 1],
... [4, 0],
... [5, 3]])
>>>
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
[2, 2],
[3, 1],
[4, 0],
[5, 3],
[7, 5]])