Python-从元组列表到另一列表的高效提取数据

时间:2019-03-20 22:34:40

标签: python numpy matrix tuples

被称为函数的多处理会生成包含元组(index_i,index_j,some_result)的raw_data列表。通常,这是一个很大的列表,但这是一些简短的示例:

raw_data = [(0, 0, 1.0),
(0, 1, 0.8006688952445984),
(0, 2, 0.7255614995956421),
(0, 3, 0.7885053157806396),
(0, 4, 0.9278563261032104),
(0, 5, 0.8481519222259521),
(0, 6, 0.5808478593826294),
(0, 7, 0.7729462385177612),
(0, 8, 0.4846215844154358),
(0, 9, 0.6634186506271362),
(1, 1, 1.0), 
(1, 2, 0.9437128305435181), 
(1, 3, 0.9655782580375671), 
(1, 4, 0.8094803690910339), 
(1, 5, 0.7461609840393066), 
(1, 6, 0.6327897906303406), 
(1, 7, 0.7813301682472229), 
(1, 8, 0.5511380434036255), 
(1, 9, 0.7230715155601501), 
(2, 2, 1.0), 
(2, 3, 0.9496157765388489), 
(2, 4, 0.6908014416694641), 
(2, 5, 0.6450313925743103), 
(2, 6, 0.510845422744751), 
(2, 7, 0.6914690732955933), 
(2, 8, 0.4440484046936035), 
(2, 9, 0.6007179617881775), 
(3, 3, 1.0), 
(3, 4, 0.7783468961715698), 
(3, 5, 0.7336279153823853), 
(3, 6, 0.6183328032493591), 
(3, 7, 0.7425610423088074), 
(3, 8, 0.4954148828983307), 
(3, 9, 0.6851986646652222), 
(4, 4, 1.0000001192092896), 
(4, 5, 0.916759729385376), 
(4, 6, 0.6729019284248352), 
(4, 7, 0.8551595211029053), 
(4, 8, 0.4803779423236847), 
(4, 9, 0.7606569528579712), 
(5, 5, 0.9999998807907104), 
(5, 6, 0.7227450013160706), 
(5, 7, 0.8301199078559875), 
(5, 8, 0.47183749079704285), 
(5, 9, 0.7638712525367737), 
(6, 6, 1.0), 
(6, 7, 0.8355474472045898), 
(6, 8, 0.5089120864868164), 
(6, 9, 0.8670180439949036), 
(7, 7, 1.0000001192092896), 
(7, 8, 0.4481610059738159), 
(7, 9, 0.9298642873764038), 
(8, 8, 0.9999999403953552), 
(8, 9, 0.43459969758987427), 
(9, 9, 0.9999998807907104)]

现在,我需要将raw_data转换为clean_data(clean_data已初始化),将元组的前两个值用作clean_list中元素的索引,将第三个值作为那些元素的值:

for item in raw_data:
    clean_data[item[0]][item[1]] = item[2]

这是可行的,但是要花很多时间。我确信可以用更有效的方式来完成它,也许使用numpy.take或numpy.choose,但是我没有设法弄清楚它是如何做到的。 :-(

2 个答案:

答案 0 :(得分:2)

不确定这是否更快,但这是使用Numpy的解决方案,该解决方案将生成一个二维Numpy数组,其中填充了值。分别根据元组的第一个和第二个元素的最大值,用形状初始化一个零填充数组。在此解决方案中,某些值仍为0,因为在您的示例中,对于x和y的每种可能组合都没有一个值。您说过您的clean_data已经初始化,因此您可以更新此代码(或在必要时在注释中要求后续操作)。

import numpy as np

#initialize zero-filled two-dimensional array
result = np.zeros((int(np.amax(np.array(raw_data)[:,:1]))+1,int(np.amax(np.array(raw_data)[:,1:2]))+1))

#parse raw_data to get just the values and indices
vals = [np.array(raw_data)[:,2]]
pos = np.array(raw_data, dtype=int)[:,:2]
rows, cols = pos[:,0], pos[:,1]

#update result array with values
result[rows,cols] = vals

result

array([[1.        , 0.8006689 , 0.7255615 , 0.78850532, 0.92785633,
        0.84815192, 0.58084786, 0.77294624, 0.48462158, 0.66341865],
       [0.        , 1.        , 0.94371283, 0.96557826, 0.80948037,
        0.74616098, 0.63278979, 0.78133017, 0.55113804, 0.72307152],
       [0.        , 0.        , 1.        , 0.94961578, 0.69080144,
        0.64503139, 0.51084542, 0.69146907, 0.4440484 , 0.60071796],
       [0.        , 0.        , 0.        , 1.        , 0.7783469 ,
        0.73362792, 0.6183328 , 0.74256104, 0.49541488, 0.68519866],
       [0.        , 0.        , 0.        , 0.        , 1.00000012,
        0.91675973, 0.67290193, 0.85515952, 0.48037794, 0.76065695],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.99999988, 0.722745  , 0.83011991, 0.47183749, 0.76387125],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 1.        , 0.83554745, 0.50891209, 0.86701804],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 1.00000012, 0.44816101, 0.92986429],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.99999994, 0.4345997 ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.99999988]])

#If needed, you can convert this array to list, e.g., result.tolist().

答案 1 :(得分:0)

最后,我决定不返回具有多处理功能结果的索引。因此,除了结果,什么都没有,有了@Nan和@AlexK的想法(谢谢!),我将这些结果放入上三角矩阵,并使用numpy函数创建对称的结果矩阵。

clean_data[np.triu_indices(dim)] = raw_data # Upper triangle with diagonale
clean_data += clean_data.T - np.diag(clean_data.diagonal()) # Full matrix

其中dim是clean_data矩阵的维数。现在我得到非常非常非常快的结果。