从多个数组中提取公共元素以创建新数组

时间:2014-02-08 07:03:14

标签: python numpy scipy

在下面的示例中,data1,data2,... data10是给定的数组。现在,我必须找出所有给定数组中存在的元素。然后,我必须创建仅包含那些公共元素的新数组,并将所有其他元素指定为nan值。

import numpy as np

data1 = np.array ([[1,2,33,4,33,6],[7,8,9,10,93,12]])
data2 = np.array ([[1,14,33,15,33,17],[18,19,20,21,93,23]])
data3 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,34]])
data4 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,34]])
data5 = np.array ([[67,25,33,26,1,28],[93,30,31,32,93,34]])
data6 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,34]])
data7 = np.array ([[24,25,33,26,1,28],[93,30,31,51,93,34]])
data8 = np.array ([[48,25,33,26,1,28],[93,30,31,32,93,34]]) 
data9 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,38]]) 
data10 = np.array ([[24,25,33,26,1,28],[73,30,31,32,93,34]])

所需的结果如下所示为结果数组:只有值33和93可用于同一位置的所有给定数组(即,它们应重叠)。换句话说,我必须在这里找出每个数组中具有相同值的重叠元素。

result = np.array([[nan,nan,33,nan,nan,nan],[nan,nan,nan,nan,93,nan]])

注意:如果有一些给定的数组(比如3),下面的代码在下面的问题中得到了解答。 Extraction of common element in given arrays to make a new array

import numpy as np
result = np.empty_like(data1, dtype=float)
# Make an array of True-False values storing which indices are the same
indices = (data1==data2) * (data2==data3)
result[indices] = data1[indices]
result[~indices] = np.nan

但是,当给定的数组很多(比如10)时,计算它的有效方法是什么。任何想法都将受到高度赞赏。

3 个答案:

答案 0 :(得分:1)

numpy方式:

import numpy as np
data1 = np.array ([[1,2,33,4,33,6],[7,8,9,10,93,12]])
data2 = np.array ([[1,14,33,15,33,17],[18,19,20,21,93,23]])
data3 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,34]])
data4 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,34]])
data5 = np.array ([[67,25,33,26,1,28],[93,30,31,32,93,34]])
data6 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,34]])
data7 = np.array ([[24,25,33,26,1,28],[93,30,31,51,93,34]])
data8 = np.array ([[48,25,33,26,1,28],[93,30,31,32,93,34]])
data9 = np.array ([[24,25,33,26,1,28],[93,30,31,32,93,38]])
data0 = np.array ([[24,25,33,26,1,28],[73,30,31,32,93,34]])
data=np.dstack((data1, data2, data3,data4,data5,data6,data7,data8,data9,data0))
np.where(np.all(np.equal(data, data1[...,np.newaxis]), axis=2), data1, np.nan)

equal进行元素比较并生成与data形状相同的数组。

all然后选择所有值等于data1

的位置

where需要3个参数(x1,x2,x3):其中x1为true,结果数组从x2获取值;如果x1为false,则生成的数组将从x3

中获取值

结果

Out[22]:
array([[ nan,  nan,  33.,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  93.,  nan]])

答案 1 :(得分:1)

如果有可能我真的建议将这样的东西初始化为单个数组。我对另一个问题的回答显示了这种方法。

修改我对其他问题的回答,以下是在这种情况下如何使用广播和花式索引。

# construct 3D array with random data
# each data array will be a slice of this array, e.g. data[:,:,0], data[:,:,1], etc.
data = np.dstack((data1, data2, data3, data4, data5, data6, data7, data8, data9, data0))
# make an empty array to store the results as before
result = np.empty_like(data[:,:,0], dtype=float)
# use broadcasting to test for equality
# Note the slicing by range in the third dimension to keep dimensions aligned
# for broadcasting to work properly.
indices = np.all(data == data[:,:,0:1], axis=-1)
# Do the assignment as before
result[indices] = data[:,:,0][indices]
result[~indices] = np.nan
result

同样,这相当于另一个答案中显示np.where的方法,但为了便于阅读,我更喜欢布尔索引,即使它可能稍微慢一点。

答案 2 :(得分:0)

如果能够找到两个数组之间的公共元素,您应该能够使用结果数组来进一步编制索引。这也非常有效。 (如果n = numArrays,m = lengthArray,最坏情况:O(n * m),最佳情况:O(m)

这样的事情:

allCommon = reduce(find_common_elements_in_two_arrays, allArrays)