我有一个包含某些元素的字符串数组,例如'na',使用x.astype(np.float)
作为给定here无法转换为float。
请建议比我做的更好的方式。请找到下面的程序(这是我的jupyter笔记本的一个片段,我已经展示了中间步骤,只是为了演示更改):
在[4]中:val_inc
Out [4]:
array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
'39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
'37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
'38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
'40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
'41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
'38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
在[5]中:val_inc[val_inc == 'na']='0'
在[6]中:val_inc
Out [6]:
array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
'39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
'37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
'38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
'40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
'41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
'38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
在[7]中:val_inc = val_inc.astype(np.float)
在[8]中:val_inc
Out [8]:
array([ 0. , 38.012 , 38.7816, 38.0736, 40.7118, 44.7382,
39.6416, 38.9177, 36.9031, 43.2611, 38.2732, 40.7129,
37.2844, 39.5835, 43.9194, 42.5485, 36.9052, 0. ,
41.9264, 45.3568, 44.6239, 38.1079, 45.2393, 32.785 ,
44.6239, 38.0216, 38.4608, 42.5644, 35.3127, 33.2936,
33.0556, 40.4476, 35.6581, 35.5574, 43.1096, 34.4751,
42.0554, 40.3944, 40.2466, 32.2567, 0. , 38.8594,
43.947 , 41.7973, 41.8105, 40.3797, 31.2868, 45.3644,
40.7177, 41.8558, 38.9249, 33.2077, 42.4053, 42.559 ])
在[9]中:np.mean(val_inc[val_inc!=0.])
Out [9]:39.587374509803915
在[10]中:val_inc[val_inc==0.]=np.mean(val_inc[val_inc!=0.])
在[11]中:val_inc
Out [11]:
array([ 39.58737451, 38.012 , 38.7816 , 38.0736 ,
40.7118 , 44.7382 , 39.6416 , 38.9177 ,
36.9031 , 43.2611 , 38.2732 , 40.7129 ,
37.2844 , 39.5835 , 43.9194 , 42.5485 ,
36.9052 , 39.58737451, 41.9264 , 45.3568 ,
44.6239 , 38.1079 , 45.2393 , 32.785 ,
44.6239 , 38.0216 , 38.4608 , 42.5644 ,
35.3127 , 33.2936 , 33.0556 , 40.4476 ,
35.6581 , 35.5574 , 43.1096 , 34.4751 ,
42.0554 , 40.3944 , 40.2466 , 32.2567 ,
39.58737451, 38.8594 , 43.947 , 41.7973 ,
41.8105 , 40.3797 , 31.2868 , 45.3644 ,
40.7177 , 41.8558 , 38.9249 , 33.2077 ,
42.4053 , 42.559 ])
答案 0 :(得分:3)
将'na'
替换为'nan'
,然后将其转换为np.nan
,然后使用np.nanmean
。
示例:
test = np.array(['0','1','nan'], dtype=float)
np.where(np.isnan(test), np.nanmean(test), test)
array([ 0. , 1. , 0.5])
答案 1 :(得分:2)
最好先将'na'转换为正确的NaN。然后,无论如何,人们都可以使用数据:
import numpy as np
val_inc[val_inc == 'na'] = np.nan # 'na' to proper NaN or missing value
val_inc = val_inc.astype(np.float) # no error here now.
print(val_inc)
输出继电器:
[ nan 38.012 38.7816 38.0736 40.7118 44.7382 39.6416 38.9177
36.9031 43.2611 38.2732 40.7129 37.2844 39.5835 43.9194 42.5485
36.9052 nan 41.9264 45.3568 44.6239 38.1079 45.2393 32.785
44.6239 38.0216 38.4608 42.5644 35.3127 33.2936 33.0556 40.4476
35.6581 35.5574 43.1096 34.4751 42.0554 40.3944 40.2466 32.2567
nan 38.8594 43.947 41.7973 41.8105 40.3797 31.2868 45.3644
40.7177 41.8558 38.9249 33.2077 42.4053 42.559 ]