我必须处理大尺寸的矩阵(150.000,120.000)。
出于这个原因,我正在寻找一种有效的(矢量化方式)来获得给定矩阵及其索引的值(避免在循环中这样做)。因为我有大约90.000个矩阵要处理。
以下是一个例子:
matrix_lables=np.random.randint(30,size=(5,8))
array([[24, 18, 4, 17, 24, 0, 3, 26],
[21, 11, 14, 9, 3, 27, 18, 14],
[25, 26, 27, 16, 26, 27, 21, 26],
[ 3, 29, 28, 2, 22, 10, 29, 28],
[21, 29, 0, 3, 13, 18, 6, 1]])
然后我得到唯一值
unique_labels=np.unique(matrix_lables)
对于每个标签l具有一组与矩阵
中的值对应的索引dictionnary=[]
for p in unique_labels:
z=matrix_lables[np.argwhere(matrix_lables==p)]
label_index = dict(zip(p, z))
dictionnary.append(label_index)
如何避免在循环中执行此操作?
当l处理数千个矩阵,其中每个矩阵具有大约15,000个标签时,它变得耗时。
stored_matrices # is the variable that stores the t 90.000 matrices
处理所有矩阵的完整算法如下:
full_dictionnary=[]
for m in np.arange(len(stored_matrices)):
tmp_matrix=stored_matrices[m]
unique_labels=np.unique(tmp_matrix)
dictionnary=[]
for p in unique_labels:
z=tmp_matrix[np.argwhere(tmp_matrix==p)]
label_index = dict(zip(p, z))
dictionnary.append(label_index)
full_dictionnary.append(dictionnary)
示例:
cd=np.random.randint(80,size=(10,5,8))
indexes=[]
labels=[]
for m in np.arange(len(cd)):
tmp_matrix=cd[m]
unique_labels=np.unique(tmp_matrix)
for p in unique_labels:
z=tmp_matrix[np.argwhere(tmp_matrix==p)]
indexes.append(z)
labels.append(p)
输出:
indexes[0] # x and y coordinates
array([[[[ 3, 51, 14, 28, 50, 30, 16, 40],
[20, 63, 31, 7, 39, 14, 38, 12],
[18, 14, 71, 46, 22, 67, 29, 58],
[34, 10, 70, 65, 18, 7, 69, 7],
[57, 76, 63, 61, 12, 58, 28, 70]],
[[ 3, 66, 1, 19, 72, 18, 24, 35],
[56, 68, 50, 26, 47, 48, 42, 18],
[74, 18, 52, 40, 37, 38, 55, 66],
[75, 29, 51, 20, 38, 11, 40, 51],
[39, 71, 51, 63, 72, 24, 48, 24]]]])
labels[0]=4