比较Python

时间:2017-02-09 18:01:05

标签: python arrays sorting numpy elements

我正在尝试编写一个基本脚本,它将帮助我找到行之间有多少相似的列。信息非常简单,如:

array = np.array([0 1 0 0 1 0 0], [0 0 1 0 1 1 0])

我必须在列表的所有排列之间执行此脚本,因此第1行与第2行相比,第1行与第3行相比,等等。

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:0)

您的标题问题可以通过基本的numpy技术来解决。我们假设你有一个两维的numpy数组a,你想比较行mn

row_m = a[m, :] # this selects row index m and all column indices, thus: row m
row_n = a[n, :]
shared = row_m == row_n # this compares row_m and row_n element-by-element storing each individual result (True or False) in a separate cell, the result thus has the same shape as row_m and row_n
overlap = shared.sum() # this sums over all elements in shared, since False is encoded as 0 and True as 1 this returns the number of shared elements.

将此配方应用于所有行对的最简单方法是广播:

 first = a[:, None, :] # None creates a new dimension to make space for a second row axis
 second = a[None, :, :] # Same but new dim in first axis
 # observe that axes 0 and 1 in these two array are arranged as for a distance map
 # a binary operation between arrays so layed out will trigger broadcasting, i.e. numpy will compute all possible pairs in the appropriate positions
 full_overlap_map = first == second # has shape nrow x nrow x ncol
 similarity_table = full_overlap_map.sum(axis=-1) # shape nrow x nrow

答案 1 :(得分:0)

如果你可以依赖二进制值的所有行"类似的列"伯爵只是

def count_sim_cols(row0, row1):
    return np.sum(row0*row1)

如果有更广泛的价值观的可能性,您只需用比较替换产品

def count_sim_cols(row0, row1):
     return np.sum(row0 == row1)

如果您希望对"相似性"表示容差,请说tol,一些小值,这只是

def count_sim_cols(row0, row1):
    return np.sum(np.abs(row0 - row1) < tol)

然后你可以双嵌套循环来获取计数。假设X是一个带有n行的numpy数组

sim_counts = {}
for i in xrange(n):
    for j in xrange(i + 1, n):
        sim_counts[(i, j)] = count_sim_cols(X[i], X[j])