我有一个numpy数组:NxM
让我们说:
input_data = np.random.rand(10,5)
我想创建一个新数组,其中新数组是input_data列之间的每个可能的差异,这将给你一个大小的数组:(10,10)
到目前为止我的代码是:
def get_data_differences(read_data):
'''Finds every possible differences between the columns of the read_data
read_data: NxM variable where M are the features
returns diff_data, and NxR variables
R is the number of every possible combination of 2 columns
'''
if len(read_data.shape) != 2:
print 'The data format is not consistent'
data_rows, data_columns = read_data.shape
data_difference = np.zeros((data_rows, 1))
for combination_pair in itertools.combinations(read_data.T, 2):
#iterate over every possible pairing of columns (hence the .T)
minuend_, substraend_ = combination_pair
difference_ = minuend_ - substraend_
data_difference = np.append(data_difference, difference_[:, None], axis = 1)
data_difference = np.delete(data_difference, 0, 1)
return data_difference
我发现删除我创建的原始零数组效率不高。
如果您有更好的建议,那就太棒了
答案 0 :(得分:2)
为什么不同时索引多个列?
np.diff(read_data[:, list(combinations(range(read_data.shape[1]), 2))])[..., 0]