我有一个numpy数组的numpy数组(很乐意使用numpy数组的列表),我想编辑整个数组。更具体地说,我检查(较大数组内的)数组是否共享值,如果共享,则从较小数组中删除共享值。
我遇到的问题是,当我尝试将修改后的数组重新插入到所有包含的数组中时,while循环完成时的最终输出不记得更新的模块了。
我相信这与python复制/查看项的细微差别有关,当我访问整个数组的元素i或j时,我是在while循环中创建一个新对象,而不是在其中编辑该元素较大的数组。但是,我很高兴地承认,尽管经过数小时的尝试,但我仍未完全理解这一点,并且最终也没有想到其他选择。
#Feature_Modules is an array (or list) of number arrays, each containing a set of integers
i = 0
j = 0
while i < Feature_Modules.shape[0]: # Check element i against every other element j
if i != j:
Ref_Module = Feature_Modules[i]
while j < Feature_Modules.shape[0]:
if i != j:
Query_Module = Feature_Modules[j]
if np.array_equal(np.sort(Ref_Module),np.sort(Query_Module)) == 1: # If modules contain exactly the same integers, delete one of this. This bit actually works and is outputted at the end.
Feature_Modules = np.delete(Feature_Modules,j)
Shared_Features = np.intersect1d(Ref_Module, Query_Module)
if Shared_Features.shape[0] > 0 and np.array_equal(np.sort(Ref_Module),np.sort(Query_Module)) == 0: # If the modules share elements, remove the shared elements from the smaller module. This is the bit that isn't outputted in the final Feature_Modules object.
Module_Cardinalities = np.array([Ref_Module.shape[0],Query_Module.shape[0]])
Smaller_Group = np.where(Module_Cardinalities == np.min(Module_Cardinalities))[0][0]
New_Groups = np.array([Ref_Module,Query_Module])
New_Groups[Smaller_Group] = np.delete(New_Groups[Smaller_Group],np.where(np.isin(New_Groups[Smaller_Group],Shared_Features) == 1))
Feature_Modules = Feature_Modules.copy()
Feature_Modules[i] = New_Groups[0] # Replace the current module of Feature_Modules with the new module (Isn't outputted at end of while loops)
Feature_Modules[j] = New_Groups[1] # Replace the current module of Feature_Modules with the new module (Isn't outputted at end of while loops)
else:
j = j + 1
else:
j = j + 1
else:
i = i + 1
i = i + 1
因此,如果我们以这个小的数据集为例,
Feature_Modules = np.array([np.array([1,2,3,4,5,6,7,8]),np.array([9,10,1,2,3,4]), np.array([20,21,22,23])])
新的Feature_Modules应该是;
Feature_Modules = np.array([np.array([1,2,3,4,5,6,7,8]), np.array([9,10]), np.array([20,21,22,23])])
因为数组[0]和[1]中的共享值已从[1]中删除,因为它是较小的数组。
答案 0 :(得分:0)
我建议对代码使用更多的python X numpy方法:
import numpy as np
Feature_Modules = np.array([np.array([1,2,3,4,5,6,7,8]), np.array([9,10,1,2,3,4]), np.array([20,21,22,23])])
for n1,arr1 in enumerate(Feature_Modules[:-1]):
l1 = len(arr1)
for n2,arr2 in enumerate(Feature_Modules[n1+1:]):
l2 = len(arr2)
intersect, ind1, ind2 = np.intersect1d(arr1, arr2, return_indices=True)
if len(intersect) == 0:
continue
if l1 > l2:
Feature_Modules[n2+n1+1] = np.delete(arr2, ind2)
else:
Feature_Modules[n1] = np.delete(arr1, ind1)
# [array([1, 2, 3, 4, 5, 6, 7, 8]) array([ 9, 10]) array([20, 21, 22, 23])]
编辑:
此代码将编辑原始数组,以跟踪已删除元素的列表。如果您想保留原始数组,请对其进行复制:
copy = np.array(original)