numpy数组矢量交集

时间:2018-09-05 22:59:52

标签: python numpy scikit-learn vectorization

我有一个3d numpy数组,它表示从我的分类器中获得的user id,形状为(NxKxM),我想计算jaccard索引(len交集/ len并集)以检查比率在我的群组折叠中重叠。我需要在N(超参数数量)和M(数量或迭代)轴上计算此函数,而K是CV折叠数。我想要类似的东西:

  

A [0] [0] [:]与A [1:] [:] [:]比较,A [0] [1] [:]与A [1:] [:] [ :]和A [0] [2] [:]与A [1:] [:] [:]在第一级进行比较,依此类推。 (如果k = 3)

我已经尝试过嵌套循环,但是代码当然非常慢,到目前为止,我遇到这种情况:

for elem in range(len(total_users_splits)):
    for subelem in range(elem+1,len(total_users_splits)):
        for i in range(n_splits):
            for j in range(n_splits):
                first = b[elem][i]
                second = b[subelem][j]
                total_num = len(np.union1d(first,second))
                intersect_len = len(np.intersect1d(first,second))
                X.append(intersect_len/total_num)
overlap = {'overlap_ratio_mean_uids':np.nanmean(X),
           'overlap_ratio_std_uids':np.nanstd(X),
           'overlap_ratio_max_uids':np.max(X),
           'overlap_ratio_min_uids':np.min(X)}

total_user_splits是维度NxM的列表,而n_splits是k。

代码确实很慢,但我不知道如何以向量化方式应用np.intersect

0 个答案:

没有答案