我有两个numpy数组,例如:
import numpy as np
a1 = np.linspace(0,2*np.pi,101)
a2 = np.random.choice(a1, 60)
我需要计算a1
中a2
的每个值出现的次数。我可以用循环来做,但我希望有更好的解决方案。
循环解决方案:
a3 = np.zeros_like(a1)
for i in range(len(a1)):
a3[i] = np.sum(a2==a1[i])
答案 0 :(得分:2)
另一种np.unique
方法:
>>> import numpy as np
>>> a1 = np.linspace(0,2*np.pi,101)
>>> a2 = np.random.choice(a1, 60)
>>>
>>> unq, idx, cnts = np.unique(np.concatenate([a1, a2]), return_inverse=True, return_counts=True)
>>> assert np.all(unq[idx[:len(a1)]] == a1)
>>> result = cnts[idx[:len(a1)]] - 1
>>> result
array([0, 0, 2, 0, 2, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 1, 0, 0, 0, 1, 1, 1, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 0, 1, 0, 2,
0, 1, 2, 1, 2, 0, 0, 1, 0, 0, 0, 0, 0, 4, 0, 0, 0, 1, 1, 1, 0, 0,
2, 0, 0, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 2, 0, 0, 0, 1, 1, 1, 1,
0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 2, 1])
答案 1 :(得分:2)
以下是使用perfplot(https://github.com/nschloe/perfplot)的不同解决方案的性能比较:
import perfplot
import numpy as np
from collections import Counter
def count_a_in_b_loop(b, a = np.linspace(0,2*np.pi,101)):
c = np.zeros_like(a)
for i in range(len(a)):
c[i] = np.sum(b==a[i])
return c
def count_a_in_b_counter(b, a=np.linspace(0,2*np.pi,101)):
c = Counter(b)
c = np.array([(c[k] if k in c else 0) for k in a])
return c
def count_occ(a2,a1=np.linspace(0,2*np.pi,101),use_closeness=True):
# Trace back indices for each elem of a2 in a1
idx = np.searchsorted(a1,a2)
# Set out of bounds indices to something within
idx[idx==len(a1)] = 0
# Check for the matches
if use_closeness==1:
mask = np.isclose(a1[idx],a2)
else:
mask = a1[idx] == a2
# Get counts
return np.bincount((idx+1)*mask,minlength=len(a1)+1)[1:]
def count_broadcasting(a2, a1=np.linspace(0,2*np.pi,101)):
(a1[:,None]==a2).sum(1) # For exact matches
return np.isclose(a1[:,None],a2).sum(1) # For close matches
def count_occ_rounding(a2, a1_lastnum=2*np.pi, a1_num_sample=101):
s = a1_lastnum/(a1_num_sample-1)
p = np.round(a2/s).astype(int)
return np.bincount(p,minlength=a1_num_sample)
def count_add_to_unique(a2, a1=np.linspace(0,2*np.pi,101)):
unq, idx, cnts = np.unique(np.concatenate([a1, a2]), return_inverse = True, return_counts = True)
#assert np.all(unq[idx[:len(a1)]] == a1)
return cnts[idx[:len(a1)]] - 1
perfplot.show(
setup=lambda n: np.random.choice(np.linspace(0,2*np.pi,101), n),
kernels=[
count_a_in_b_loop, count_a_in_b_counter, count_occ, count_broadcasting, count_occ_rounding, add_to_unique
],
labels=['loop', 'counter','searchsorted','broadcasting','occ_rounding','add_to_unique'],
n_range=[2**k for k in range(15)],
xlabel='len(a)'
)
答案 2 :(得分:1)
如果我正确理解您的问题,这是使用np.unique
和np.isin
的一种方式:
import numpy as np
a1 = np.linspace(0,2*np.pi,101)
a2 = np.random.choice(a1, 60)
vals_counts = np.unique(a2, return_counts=True)
arr = np.array(list(zip(*vals_counts)))
print(arr.shape)
# (46, 2)
res = arr[np.where(np.isin(arr[:, 0], a1))]
print(res.shape)
# (46, 2)
print(res)
[[ 0.06283185 1. ]
[ 0.12566371 1. ]
...
[ 5.65486678 3. ]
[ 5.96902604 2. ]
[ 6.09468975 1. ]
[ 6.1575216 1. ]
[ 6.28318531 1. ]]
答案 3 :(得分:1)
方法#1
基于np.searchsorted
的一个向量化解决方案,其中包含浮点数的接近度,将是 -
def count_occ(a1,a2,use_closeness=True):
# Trace back indices for each elem of a2 in a1
idx = np.searchsorted(a1,a2)
# Set out of bounds indices to something within
idx[idx==len(a1)] = 0
# Check for the matches
if use_closeness==1:
mask = np.isclose(a1[idx],a2)
else:
mask = a1[idx] == a2
# Get counts
return np.bincount((idx+1)*mask,minlength=len(a1)+1)[1:]
示例运行 -
In [154]: a1 = np.array([1.0000001,4,5,6])
In [155]: a2 = np.array([2,5,8,5,8,5,0.999999999999])
In [156]: count_occ(a1,a2)
Out[156]: array([1, 0, 3, 0])
In [157]: count_occ(a1,a2,use_closeness=False)
Out[157]: array([0, 0, 3, 0])
方法#2
或者,我们也可以使用broadcasting
作为简短但内存密集的方法,如此 -
(a1[:,None]==a2).sum(1) # For exact matches
np.isclose(a1[:,None],a2).sum(1) # For close matches
方法#3:a1
作为间隔数据的特定情况
对于a1
为lin-spaced阵列并再次考虑 closeness 的特定情况,我们可以使用rounding
a2
数据进一步优化,像这样 -
def count_occ_rounding(a2, a1_startnum=0,a1_lastnum=2*np.pi, a1_num_sample=101):
s = (a1_lastnum-a1_startnum)/(a1_num_sample-1)
p = np.round((a2 - a1_startnum)/s).astype(int)
return np.bincount(p,minlength=a1_num_sample)
示例运行以验证具有a1
的通用开始,结束范围数组的输出 -
In [284]: a1 = np.linspace(-2*np.pi,2*np.pi,201)
...: a2 = np.random.choice(a1, 60)
...: out1 = count_occ_rounding(a2, -2*np.pi, 2*np.pi, 201)
...: out2 = np.isclose(a1[:,None],a2).sum(1)
...: print np.allclose(out1, out2)
True