Question

我正在使用庞大的数据集。我想要做的是取所有值＆gt;从数组中取0并将它们放在一个新数组中，对这些提取的值运行统计信息，然后将新值放回原始数组中。

假设我有一个数组[0,0,0,0,0, . . . .32, .44,0,0,0]（即下面脚本中的对象arr）：我想删除诸如.32，.44等的值，并将它们放入新阵列arr2。

然后我想对第二个数组进行统计分析（PCA），获取与原始数组中原始位置对应的新值，并用这些新值替换原始值。我已经开始在下面对此进行编码，但不知道如何提取值＆gt; 0，同时保持阵列中的位置。

import os
import nibabel as nb
import numpy as np
import numpy.linalg as npl
import nibabel as nib
import matplotlib.pyplot as plt
from matplotlib.mlab import PCA
#from dipy.io.image import load_nifti, save_nifti

np.set_printoptions(precision=4, suppress=True)
FA = './all_FA_skeletonised.nii'

from dipy.io.image import load_nifti
img = nib.load(FA)
data = img.get_data()
data.shape        #get x,y,z and subject # parameters from image

#place subject number into a variable
vol_shape = data.shape[:-1] # x,y,z coordinates
n_vols = data.shape[-1]   # 28 subjects volumes

# N is the num of voxels (dimensions) in a volume
N = np.prod(vol_shape)

#- Reshape first dimension of whole image data array to N, and take
#- transpose
arr2 = []
arr = data.reshape(N, n_vols).T  # 28 X 7,200,000 array
for a in array:
    if a > 0:
        arr2.append(a)

row_means = np.outer(np.mean(arr2, axis=1), np.ones(N))
X = arr2 - row_means # mean center data array

#- Calculate unscaled covariance matrix of X
unscaled_covariance = X.dot(X.T)
unscaled_covariance.shape

# Calculate U, S, VT with SVD on unscaled covariance matrix
U, S, VT = npl.svd(unscaled_covariance)
#- Use subplots to make axes to plot first 10 principal component
#- vectors
#- Plot one component vector per sub-plot.
fig, axes = plt.subplots(10, 1)
for i, ax in enumerate(axes):
    ax.plot(U[:, i])

#- Calculate scalar projections for projecting X onto U
#- Put results into array C.
C = U.T.dot(X)

***#- Put values in C back into original data matrix***

Answer 1

我会用他们的位置（在原始数组中）提取所需的值，并将它们作为index_in_the_original_array: value_in_the_original_array存储在字典中。然后我会对字典中的values进行计算。最后，我们保留了索引（作为字典中的键），用于替换原始数组中的值。在代码中：

from pprint import pprint

original_array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Collecting all values & indices of the elements that are greater than 5:
my_dictionary = {index: value for index, value in enumerate(original_array) if value > 5}
pprint(original_array)      # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pprint(my_dictionary)       # {5: 6, 6: 7, 7: 8, 8: 9, 9: 10}

# doing the processing (Here just incrementing the values by 2):
my_dictionary = {key: my_dictionary[key] + 2 for key in my_dictionary.keys()}
pprint(my_dictionary)       # {5: 8, 6: 9, 7: 10, 8: 11, 9: 12}

# Replacing the new values into the original array:
for key in my_dictionary.keys():
    original_array[key] = my_dictionary[key]

pprint(original_array)      # [1, 2, 3, 4, 5, 8, 9, 10, 11, 12]

<强>更新

如果我们想避免使用字典，我们可以执行以下操作，基本上与上面相同。

import numpy as np

def process_data(data):
    return data * 5

original_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
new_array = np.array([[index, value] for index, value in enumerate(original_array) if value > 5])
print(new_array)    # [[ 5  6]
                    #  [ 6  7]
                    #  [ 7  8]
                    #  [ 8  9]
                    #  [ 9 10]]

# doing the processing (Here, just using the above function that multiplies the values by 5):
new_array[:, 1] = process_data(new_array[:, 1])
print(new_array)    # [[ 5 30]
                    #  [ 6 35]
                    #  [ 7 40]
                    #  [ 8 45]
                    #  [ 9 50]]

# Replacing the new values into the original array:
for indx, val in new_array:
    original_array[indx] = val

print(original_array)  # [ 1  2  3  4  5 30 35 40 45 50]

Answer 2

编辑：错误地提出了问题（请参阅评论），所以这里有更新。

假设我们有a=[0,0,1,2,0,3]和b=[.1, .1, .1]，并希望将它们组合起来以获得[0, 0,.1, .1, 0, 0.1]，即0保留在相同的索引处，所有其他值将被替换：

import numpy as np
b = np.array([.1, .1, .1])
a = np.array([0,0,1,2,0,3], dtype='float64')  # expects same dtype
np.place(a, a>0, b)  # modify in place

如果您需要原始值，请在a行之前备份np.place。

以前的版本：

不确定我是否让你正确，假设通过'保持数组中的位置'，你的意思是例如[0,0,1,2,0,3,0]应该eval [1,2,3] （而不是[1,3,2]或其他）。您可以a[a!=]执行此操作，其中a是您的数组。如果您只想取消前导/尾随零，请尝试使用numpy.trim_zeros。

如果输入是2D数组或矩阵，那么事情应该是不同的，因为你需要保持它们的形状。

如何从Python数组中删除值，对它们执行操作，然后在原始数组中替换它们

2 个答案: