我有一个3d numpy数组,它表示从我的分类器中获得的user id
,形状为(NxKxM)
,我想计算jaccard索引(len交集/ len并集)以检查比率在我的群组折叠中重叠。我需要在N(超参数数量)和M(数量或迭代)轴上计算此函数,而K
是CV折叠数。我想要类似的东西:
A [0] [0] [:]与A [1:] [:] [:]比较,A [0] [1] [:]与A [1:] [:] [ :]和A [0] [2] [:]与A [1:] [:] [:]在第一级进行比较,依此类推。 (如果k = 3)
我已经尝试过嵌套循环,但是代码当然非常慢,到目前为止,我遇到这种情况:
for elem in range(len(total_users_splits)):
for subelem in range(elem+1,len(total_users_splits)):
for i in range(n_splits):
for j in range(n_splits):
first = b[elem][i]
second = b[subelem][j]
total_num = len(np.union1d(first,second))
intersect_len = len(np.intersect1d(first,second))
X.append(intersect_len/total_num)
overlap = {'overlap_ratio_mean_uids':np.nanmean(X),
'overlap_ratio_std_uids':np.nanstd(X),
'overlap_ratio_max_uids':np.max(X),
'overlap_ratio_min_uids':np.min(X)}
total_user_splits
是维度NxM的列表,而n_splits
是k。
代码确实很慢,但我不知道如何以向量化方式应用np.intersect