有没有更好的方法将'对象'类型数组转换为numpy数组通过用'均值'替换'na'?

时间:2018-01-09 08:08:57

标签: python arrays string numpy

我有一个包含某些元素的字符串数组,例如'na',使用x.astype(np.float)作为给定here无法转换为float。

请建议比我做的更好的方式。请找到下面的程序(这是我的jupyter笔记本的一个片段,我已经展示了中间步骤,只是为了演示更改):

在[4]中:val_inc

Out [4]:

array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)

在[5]中:val_inc[val_inc == 'na']='0'

在[6]中:val_inc

Out [6]:

array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)

在[7]中:val_inc = val_inc.astype(np.float)

在[8]中:val_inc

Out [8]:

array([  0.    ,  38.012 ,  38.7816,  38.0736,  40.7118,  44.7382,
        39.6416,  38.9177,  36.9031,  43.2611,  38.2732,  40.7129,
        37.2844,  39.5835,  43.9194,  42.5485,  36.9052,   0.    ,
        41.9264,  45.3568,  44.6239,  38.1079,  45.2393,  32.785 ,
        44.6239,  38.0216,  38.4608,  42.5644,  35.3127,  33.2936,
        33.0556,  40.4476,  35.6581,  35.5574,  43.1096,  34.4751,
        42.0554,  40.3944,  40.2466,  32.2567,   0.    ,  38.8594,
        43.947 ,  41.7973,  41.8105,  40.3797,  31.2868,  45.3644,
        40.7177,  41.8558,  38.9249,  33.2077,  42.4053,  42.559 ])

在[9]中:np.mean(val_inc[val_inc!=0.])

Out [9]:39.587374509803915

在[10]中:val_inc[val_inc==0.]=np.mean(val_inc[val_inc!=0.])

在[11]中:val_inc

Out [11]:

array([ 39.58737451,  38.012     ,  38.7816    ,  38.0736    ,
        40.7118    ,  44.7382    ,  39.6416    ,  38.9177    ,
        36.9031    ,  43.2611    ,  38.2732    ,  40.7129    ,
        37.2844    ,  39.5835    ,  43.9194    ,  42.5485    ,
        36.9052    ,  39.58737451,  41.9264    ,  45.3568    ,
        44.6239    ,  38.1079    ,  45.2393    ,  32.785     ,
        44.6239    ,  38.0216    ,  38.4608    ,  42.5644    ,
        35.3127    ,  33.2936    ,  33.0556    ,  40.4476    ,
        35.6581    ,  35.5574    ,  43.1096    ,  34.4751    ,
        42.0554    ,  40.3944    ,  40.2466    ,  32.2567    ,
        39.58737451,  38.8594    ,  43.947     ,  41.7973    ,
        41.8105    ,  40.3797    ,  31.2868    ,  45.3644    ,
        40.7177    ,  41.8558    ,  38.9249    ,  33.2077    ,
        42.4053    ,  42.559     ])

2 个答案:

答案 0 :(得分:3)

'na'替换为'nan',然后将其转换为np.nan,然后使用np.nanmean

示例:

test = np.array(['0','1','nan'], dtype=float)
np.where(np.isnan(test), np.nanmean(test), test)

array([ 0. ,  1. ,  0.5])

答案 1 :(得分:2)

最好先将'na'转换为正确的NaN。然后,无论如何,人们都可以使用数据:

import numpy as np
val_inc[val_inc == 'na'] = np.nan   # 'na' to proper NaN or missing value
val_inc = val_inc.astype(np.float)  # no error here now.
print(val_inc)

输出继电器:

[     nan  38.012   38.7816  38.0736  40.7118  44.7382  39.6416  38.9177
  36.9031  43.2611  38.2732  40.7129  37.2844  39.5835  43.9194  42.5485
  36.9052      nan  41.9264  45.3568  44.6239  38.1079  45.2393  32.785
  44.6239  38.0216  38.4608  42.5644  35.3127  33.2936  33.0556  40.4476
  35.6581  35.5574  43.1096  34.4751  42.0554  40.3944  40.2466  32.2567
      nan  38.8594  43.947   41.7973  41.8105  40.3797  31.2868  45.3644
  40.7177  41.8558  38.9249  33.2077  42.4053  42.559 ]