我正在使用庞大的数据集。我想要做的是取所有值>从数组中取0并将它们放在一个新数组中,对这些提取的值运行统计信息,然后将新值放回原始数组中。
假设我有一个数组[0,0,0,0,0, . . . .32, .44,0,0,0]
(即下面脚本中的对象arr
):我想删除诸如.32,.44等的值,并将它们放入新阵列arr2
。
然后我想对第二个数组进行统计分析(PCA),获取与原始数组中原始位置对应的新值,并用这些新值替换原始值。我已经开始在下面对此进行编码,但不知道如何提取值> 0,同时保持阵列中的位置。
import os
import nibabel as nb
import numpy as np
import numpy.linalg as npl
import nibabel as nib
import matplotlib.pyplot as plt
from matplotlib.mlab import PCA
#from dipy.io.image import load_nifti, save_nifti
np.set_printoptions(precision=4, suppress=True)
FA = './all_FA_skeletonised.nii'
from dipy.io.image import load_nifti
img = nib.load(FA)
data = img.get_data()
data.shape #get x,y,z and subject # parameters from image
#place subject number into a variable
vol_shape = data.shape[:-1] # x,y,z coordinates
n_vols = data.shape[-1] # 28 subjects volumes
# N is the num of voxels (dimensions) in a volume
N = np.prod(vol_shape)
#- Reshape first dimension of whole image data array to N, and take
#- transpose
arr2 = []
arr = data.reshape(N, n_vols).T # 28 X 7,200,000 array
for a in array:
if a > 0:
arr2.append(a)
row_means = np.outer(np.mean(arr2, axis=1), np.ones(N))
X = arr2 - row_means # mean center data array
#- Calculate unscaled covariance matrix of X
unscaled_covariance = X.dot(X.T)
unscaled_covariance.shape
# Calculate U, S, VT with SVD on unscaled covariance matrix
U, S, VT = npl.svd(unscaled_covariance)
#- Use subplots to make axes to plot first 10 principal component
#- vectors
#- Plot one component vector per sub-plot.
fig, axes = plt.subplots(10, 1)
for i, ax in enumerate(axes):
ax.plot(U[:, i])
#- Calculate scalar projections for projecting X onto U
#- Put results into array C.
C = U.T.dot(X)
***#- Put values in C back into original data matrix***
答案 0 :(得分:1)
我会用他们的位置(在原始数组中)提取所需的值,并将它们作为index_in_the_original_array: value_in_the_original_array
存储在字典中。然后我会对字典中的values
进行计算。最后,我们保留了索引(作为字典中的键),用于替换原始数组中的值。在代码中:
from pprint import pprint
original_array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Collecting all values & indices of the elements that are greater than 5:
my_dictionary = {index: value for index, value in enumerate(original_array) if value > 5}
pprint(original_array) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pprint(my_dictionary) # {5: 6, 6: 7, 7: 8, 8: 9, 9: 10}
# doing the processing (Here just incrementing the values by 2):
my_dictionary = {key: my_dictionary[key] + 2 for key in my_dictionary.keys()}
pprint(my_dictionary) # {5: 8, 6: 9, 7: 10, 8: 11, 9: 12}
# Replacing the new values into the original array:
for key in my_dictionary.keys():
original_array[key] = my_dictionary[key]
pprint(original_array) # [1, 2, 3, 4, 5, 8, 9, 10, 11, 12]
<强>更新强>
如果我们想避免使用字典,我们可以执行以下操作,基本上与上面相同。
import numpy as np
def process_data(data):
return data * 5
original_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
new_array = np.array([[index, value] for index, value in enumerate(original_array) if value > 5])
print(new_array) # [[ 5 6]
# [ 6 7]
# [ 7 8]
# [ 8 9]
# [ 9 10]]
# doing the processing (Here, just using the above function that multiplies the values by 5):
new_array[:, 1] = process_data(new_array[:, 1])
print(new_array) # [[ 5 30]
# [ 6 35]
# [ 7 40]
# [ 8 45]
# [ 9 50]]
# Replacing the new values into the original array:
for indx, val in new_array:
original_array[indx] = val
print(original_array) # [ 1 2 3 4 5 30 35 40 45 50]
答案 1 :(得分:0)
编辑:错误地提出了问题(请参阅评论),所以这里有更新。
假设我们有a=[0,0,1,2,0,3]
和b=[.1, .1, .1]
,并希望将它们组合起来以获得[0, 0,.1, .1, 0, 0.1]
,即0保留在相同的索引处,所有其他值将被替换:
import numpy as np
b = np.array([.1, .1, .1])
a = np.array([0,0,1,2,0,3], dtype='float64') # expects same dtype
np.place(a, a>0, b) # modify in place
如果您需要原始值,请在a
行之前备份np.place
。
以前的版本:
不确定我是否让你正确,假设通过'保持数组中的位置',你的意思是例如[0,0,1,2,0,3,0]应该eval [1,2,3] (而不是[1,3,2]或其他)。您可以a[a!=]
执行此操作,其中a
是您的数组。如果您只想取消前导/尾随零,请尝试使用numpy.trim_zeros
。
如果输入是2D数组或矩阵,那么事情应该是不同的,因为你需要保持它们的形状。