置换3d数组“切片”中的行以相互匹配

时间:2016-06-25 10:27:02

标签: python arrays sorting numpy cluster-analysis

我有一系列2d数组,其中行是某些空间中的点。所有数组都有许多类似的点,但行顺序不同。 我想对行进行排序,以便它们具有最相似的顺序。对于使用K-means或DBSCAN进行群集而言,这些点也太不同了。问题也可以像这样。如果我将数组堆叠成3d数组,我如何置换行以最小化沿第二轴的平均标准偏差(SD)? 这个问题的排序算法是什么?

我尝试了以下方法。

  1. 创建一组引用2d数组并对每个数组中的行进行排序,以最大限度地减少到参考2d数组的欧几里德距离。 这恐怕会产生偏见。

  2. 按顺序对数组中的行进行排序,然后成对配对,然后配对,等等......这不起作用,我不确定原因。

  3. 第三种方法可能只是强力优化,但我试图避免这种情况,因为我有多组数组来执行该程序。

    这是我的第二种方法(Python)的代码:

    def reorder_to(A, B):
        """Reorder rows in A to best match rows in B.
    
        Input
        -----
        A : N x M numpy.array
        B : N x M numpy.array
    
        Output
        ------
        perm_order : permutation order
        """
    
        if A.shape != B.shape:
            print "A and B must have the same shape"
            return None
    
        N = A.shape[0]
    
        # Create a distance matrix of distance between rows in A and B
        distance_matrix = np.ones((N, N))*np.inf
        for i, a in enumerate(A):
            for ii, b in enumerate(B):
                ba = (b-a)
                distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))
    
        # Choose permutation order by smallest distances first
        perm_order = [[] for _ in range(N)]
        for _ in range(N):
            ind = np.argmin(distance_matrix)
            i, ii = ind/N, ind%N
            perm_order[ii] = i
            distance_matrix[i, :] = np.inf
            distance_matrix[:, ii] = np.inf
    
        return perm_order
    
    
    def permute_tensor_rows(A):
        """Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.
    
        Input
        -----
        A : numpy.3darray
            Each "slice" in the 2nd direction is an independent array whose rows can be permuted
            to decrease the average SD in the 2nd direction.
    
        Output
        ------
        A : numpy.3darray
            A with sorted rows in each "slice".
        """
        step = 2
        while step <= A.shape[2]:
            for k in range(0, A.shape[2], step):
    
                # If last, reorder to previous
                if k + step > A.shape[2]:
                    A_kk = A[:, :, k:(k+step)]
                    kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
                    A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
                    continue
    
                k_0, k_1 = k, k+step/2
                kk_0, kk_1 = k+step/2, k+step
    
                A_k = A[:, :, k_0:k_1]
                A_kk = A[:, :, kk_0:kk_1]
    
                order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
                A[:, :, k_0:k_1] = A[order, :, k_0:k_1]
    
            print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
            step *= 2
    
        return A
    

1 个答案:

答案 0 :(得分:1)

抱歉,我应该查看您的代码示例;这非常有用。

这里看起来似乎为您的问题提供了开箱即用的解决方案:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment

根据我的经验,只有最多100分才真正可行。