被称为函数的多处理会生成包含元组(index_i,index_j,some_result)的raw_data列表。通常,这是一个很大的列表,但这是一些简短的示例:
raw_data = [(0, 0, 1.0),
(0, 1, 0.8006688952445984),
(0, 2, 0.7255614995956421),
(0, 3, 0.7885053157806396),
(0, 4, 0.9278563261032104),
(0, 5, 0.8481519222259521),
(0, 6, 0.5808478593826294),
(0, 7, 0.7729462385177612),
(0, 8, 0.4846215844154358),
(0, 9, 0.6634186506271362),
(1, 1, 1.0),
(1, 2, 0.9437128305435181),
(1, 3, 0.9655782580375671),
(1, 4, 0.8094803690910339),
(1, 5, 0.7461609840393066),
(1, 6, 0.6327897906303406),
(1, 7, 0.7813301682472229),
(1, 8, 0.5511380434036255),
(1, 9, 0.7230715155601501),
(2, 2, 1.0),
(2, 3, 0.9496157765388489),
(2, 4, 0.6908014416694641),
(2, 5, 0.6450313925743103),
(2, 6, 0.510845422744751),
(2, 7, 0.6914690732955933),
(2, 8, 0.4440484046936035),
(2, 9, 0.6007179617881775),
(3, 3, 1.0),
(3, 4, 0.7783468961715698),
(3, 5, 0.7336279153823853),
(3, 6, 0.6183328032493591),
(3, 7, 0.7425610423088074),
(3, 8, 0.4954148828983307),
(3, 9, 0.6851986646652222),
(4, 4, 1.0000001192092896),
(4, 5, 0.916759729385376),
(4, 6, 0.6729019284248352),
(4, 7, 0.8551595211029053),
(4, 8, 0.4803779423236847),
(4, 9, 0.7606569528579712),
(5, 5, 0.9999998807907104),
(5, 6, 0.7227450013160706),
(5, 7, 0.8301199078559875),
(5, 8, 0.47183749079704285),
(5, 9, 0.7638712525367737),
(6, 6, 1.0),
(6, 7, 0.8355474472045898),
(6, 8, 0.5089120864868164),
(6, 9, 0.8670180439949036),
(7, 7, 1.0000001192092896),
(7, 8, 0.4481610059738159),
(7, 9, 0.9298642873764038),
(8, 8, 0.9999999403953552),
(8, 9, 0.43459969758987427),
(9, 9, 0.9999998807907104)]
现在,我需要将raw_data转换为clean_data(clean_data已初始化),将元组的前两个值用作clean_list中元素的索引,将第三个值作为那些元素的值:
for item in raw_data:
clean_data[item[0]][item[1]] = item[2]
这是可行的,但是要花很多时间。我确信可以用更有效的方式来完成它,也许使用numpy.take或numpy.choose,但是我没有设法弄清楚它是如何做到的。 :-(
答案 0 :(得分:2)
不确定这是否更快,但这是使用Numpy的解决方案,该解决方案将生成一个二维Numpy数组,其中填充了值。分别根据元组的第一个和第二个元素的最大值,用形状初始化一个零填充数组。在此解决方案中,某些值仍为0,因为在您的示例中,对于x和y的每种可能组合都没有一个值。您说过您的clean_data已经初始化,因此您可以更新此代码(或在必要时在注释中要求后续操作)。
import numpy as np
#initialize zero-filled two-dimensional array
result = np.zeros((int(np.amax(np.array(raw_data)[:,:1]))+1,int(np.amax(np.array(raw_data)[:,1:2]))+1))
#parse raw_data to get just the values and indices
vals = [np.array(raw_data)[:,2]]
pos = np.array(raw_data, dtype=int)[:,:2]
rows, cols = pos[:,0], pos[:,1]
#update result array with values
result[rows,cols] = vals
result
array([[1. , 0.8006689 , 0.7255615 , 0.78850532, 0.92785633,
0.84815192, 0.58084786, 0.77294624, 0.48462158, 0.66341865],
[0. , 1. , 0.94371283, 0.96557826, 0.80948037,
0.74616098, 0.63278979, 0.78133017, 0.55113804, 0.72307152],
[0. , 0. , 1. , 0.94961578, 0.69080144,
0.64503139, 0.51084542, 0.69146907, 0.4440484 , 0.60071796],
[0. , 0. , 0. , 1. , 0.7783469 ,
0.73362792, 0.6183328 , 0.74256104, 0.49541488, 0.68519866],
[0. , 0. , 0. , 0. , 1.00000012,
0.91675973, 0.67290193, 0.85515952, 0.48037794, 0.76065695],
[0. , 0. , 0. , 0. , 0. ,
0.99999988, 0.722745 , 0.83011991, 0.47183749, 0.76387125],
[0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0.83554745, 0.50891209, 0.86701804],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 1.00000012, 0.44816101, 0.92986429],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.99999994, 0.4345997 ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.99999988]])
#If needed, you can convert this array to list, e.g., result.tolist().
答案 1 :(得分:0)
最后,我决定不返回具有多处理功能结果的索引。因此,除了结果,什么都没有,有了@Nan和@AlexK的想法(谢谢!),我将这些结果放入上三角矩阵,并使用numpy函数创建对称的结果矩阵。
clean_data[np.triu_indices(dim)] = raw_data # Upper triangle with diagonale
clean_data += clean_data.T - np.diag(clean_data.diagonal()) # Full matrix
其中dim是clean_data矩阵的维数。现在我得到非常非常非常快的结果。