重新索引数组中的数据,使得缺少的数据点充满NaN

时间:2017-09-10 10:02:50

标签: python arrays pandas numpy nan

我有几个如下所示的阵列:

[[ 0.          1.          0.73475787  0.36224658  0.08579446 -0.11767365
  -0.09927562  0.17444341  0.47212111  1.00584593  1.69147789  1.89421069
   1.4718292 ]
 [ 2.          1.          0.68744907  0.38420843  0.25922927  0.04719614
   0.00841919  0.21967246  0.22183329  0.28910002  0.54637077 -0.04389335
  -1.33445338]
 [ 3.          1.          0.77854922  0.41093192  0.0713814  -0.08194854
  -0.07885753  0.1491798   0.56297583  1.0759857   1.57149366  1.37958867
   0.64409152]
 [ 5.          1.          0.09182989  0.14988215 -0.1272845   0.12154707
  -0.01194815 -0.06136953  0.18783772  0.46631855  0.78850281  0.64755372
   0.69757144]]

请注意,该数组[i,0]给了我一个计数。在此特定阵列中,缺少计数1,4和6。在其他情况下,我可能会2,3或5或不缺少什么。

现在,对于我后来的荟萃分析,我希望让数组包含所有NaN以用于缺失计数。

在上面的例子中,我想要

[[ 0.          1.          0.73475787  0.36224658  0.08579446 -0.11767365
  -0.09927562  0.17444341  0.47212111  1.00584593  1.69147789  1.89421069
   1.4718292 ]
[[ 1.          NaN          NaN         NaN        NaN         NaN
   NaN         NaN          NaN         NaN        NaN         NaN
   NaN ]
 [ 2.          1.          0.68744907  0.38420843  0.25922927  0.04719614
   0.00841919  0.21967246  0.22183329  0.28910002  0.54637077 -0.04389335
  -1.33445338]
 [ 3.          1.          0.77854922  0.41093192  0.0713814  -0.08194854
  -0.07885753  0.1491798   0.56297583  1.0759857   1.57149366  1.37958867
   0.64409152]
[[ 4.          NaN          NaN         NaN        NaN         NaN
   NaN         NaN          NaN         NaN        NaN         NaN
   NaN ]
 [ 5.          1.          0.09182989  0.14988215 -0.1272845   0.12154707
  -0.01194815 -0.06136953  0.18783772  0.46631855  0.78850281  0.64755372
   0.69757144]]
[[ 6.          NaN          NaN         NaN        NaN         NaN
   NaN         NaN          NaN         NaN        NaN         NaN
   NaN ]

要重新排序我的数组,请尝试以下操作:

influence_incl_missing = np.ones((len(vec_conc),len(results)+1))
for i, conc in enumerate(vec_conc):
    if i == influence[i,0]:
        influence_incl_missing[i,:] = influence[i,:]
    else:
        influence_incl_missing[i,1:] = np.full(len(results),np.nan)
        influence_incl_missing[i,0] = i

这给了我明显的错误

IndexError: index 4 is out of bounds for axis 0 with size 4

因为len(影响)< LEN(vec_conc)。

我怎么能在python中做到这一点?

非常感谢!!

1 个答案:

答案 0 :(得分:0)

安装pandas:

pip install pandas

将您的数据加载到pandas数据框并应用reindex操作 - 应该这样做。

import pandas as pd

df = pd.DataFrame(arr)  # arr is your array

arr = df.set_index(df.columns[0])\
        .reindex(range(len(vec_conc)))\
        .reset_index().values
arr 
array([[ 0.        ,  1.        ,  0.73475787,  0.36224658,  0.08579446,
        -0.11767365, -0.09927562,  0.17444341,  0.47212111,  1.00584593,
         1.69147789,  1.89421069,  1.4718292 ],
       [ 1.        ,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan],
       [ 2.        ,  1.        ,  0.68744907,  0.38420843,  0.25922927,
         0.04719614,  0.00841919,  0.21967246,  0.22183329,  0.28910002,
         0.54637077, -0.04389335, -1.33445338],
       [ 3.        ,  1.        ,  0.77854922,  0.41093192,  0.0713814 ,
        -0.08194854, -0.07885753,  0.1491798 ,  0.56297583,  1.0759857 ,
         1.57149366,  1.37958867,  0.64409152],
       [ 4.        ,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan],
       [ 5.        ,  1.        ,  0.09182989,  0.14988215, -0.1272845 ,
         0.12154707, -0.01194815, -0.06136953,  0.18783772,  0.46631855,
         0.78850281,  0.64755372,  0.69757144],
       [ 6.        ,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan]])