使用numpy读取文件中的字符串浮动

时间:2018-02-11 05:32:06

标签: python numpy multidimensional-array scipy

我有一个这样的文本文件:

"-3.588920831680E-02","1.601887196302E-01","1.302309112549E+02"
"3.739478886127E-01","1.782759875059E-01","6.490543365479E+01"
"3.298096954823E-01","6.939357519150E-02","2.112392578125E+02"
"-2.319437451661E-02","1.149862855673E-01","2.712340698242E+02"
"-1.015115305781E-01","-1.082316488028E-01","6.532022094727E+01"
"-5.374089814723E-03","1.031072884798E-01","5.510117187500E+02"
"6.748274713755E-02","1.679160743952E-01","4.033969116211E+02"
"1.027429699898E-01","1.379162818193E-02","2.374352874756E+02"
"-1.371455192566E-01","1.483036130667E-01","2.703260498047E+02"
"NULL","NULL","NULL"
"3.968210220337E-01","1.893606968224E-02","2.803018188477E+01"

我尝试使用numpy读取此文本文件:

dat = np.genfromtxt('data.txt',delimiter=',',dtype='str')
print("dat = {}".format(dat))

# now when I try to convert to float
dat = dat.astype(np.float) # it fails

# try to make it float
dat = np.char.strip(dat, '"').astype(float)
File "test.py", line 25, in <module>
    dat = dat.astype(np.float)  # it fails
ValueError: could not convert string to float: '"-3.588920831680E-02"'

如何解决此错误?

相关链接:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt

2 个答案:

答案 0 :(得分:2)

您可以使用csv模块直接读取该文件,如:

代码:

import csv
import numpy as np

reader = csv.reader(open('file1'), delimiter=",")
data = np.array([[float(i) if i != 'NULL' else np.nan for i in row]
                  for row in reader])

print(data)

结果:

[[ -3.58892083e-02   1.60188720e-01   1.30230911e+02]
 [  3.73947889e-01   1.78275988e-01   6.49054337e+01]
 [  3.29809695e-01   6.93935752e-02   2.11239258e+02]
 [ -2.31943745e-02   1.14986286e-01   2.71234070e+02]
 [ -1.01511531e-01  -1.08231649e-01   6.53202209e+01]
 [ -5.37408981e-03   1.03107288e-01   5.51011719e+02]
 [  6.74827471e-02   1.67916074e-01   4.03396912e+02]
 [  1.02742970e-01   1.37916282e-02   2.37435287e+02]
 [ -1.37145519e-01   1.48303613e-01   2.70326050e+02]
 [             nan              nan              nan]
 [  3.96821022e-01   1.89360697e-02   2.80301819e+01]]

答案 1 :(得分:-1)

问题是你的浮点数被2个引号括起来而不是1.Numpy希望你的数组有像

这样的字符串

' "1.45E-02" '

相反,你有像

这样的东西

dat_new = np.char.replace(dat,'"','') dat_new = np.char.replace(dat_new,'NULL','0') #You also need to do something #with NULL. Here I am just replacing it with 0. dat_new = dat_new.astype(float) (注意开头和结尾的额外双引号)。

因此解决这个问题的方法就是删除那些额外的双引号,这可以很容易地完成,如下所示:

np.char.replace(np_array,string_to_replace,replacement)

{{1}}基本上可以作为&#39;查找和替换&#39;并用第三个参数替换第二个参数的每个实例。