使用numpy的loadtext将import string转换为float

时间:2016-08-27 20:47:23

标签: python python-3.x ipython

我尝试从平面文件导入文本并将其转换为单行内的浮点值。我看到this post有相同的错误,但我还没有找到输入文件中哪些字符无效。或者我有语法错误?

以字符串形式导入打印结果:

data = np.loadtxt(file, delimiter='\t', dtype=str)
print(data[0:2])
... 
[["b'Time'" "b'Percent'"]
 ["b'99'" "b'0.067'"]]

尝试导入为float:

# Import data as floats and skip the first row: data_float
data_float = np.loadtxt(data, delimiter='\t', dtype=float, skiprows=1)

它会抛出以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    data_float = np.loadtxt(data, delimiter='\t', dtype=float, skiprows=1)
  File "<stdin>", line 848, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "<stdin>", line 848, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: could not convert string to float: b'["b\'99\'" "b\'0.067\'"]'

顺便说一下,我还看到this post解释了b字符,但我不认为这是问题。

第一个答案建议的其他问题排查步骤:

data = np.loadtxt(file, delimiter="\tb'", dtype=str)

返回:

array(["b'Time\\tPercent'", "b'99\\t0.067'", "b'99\\t0.133'",
       "b'99\\t0.067'", "b'99\\t0'", "b'99\\t0'", "b'0\\t0.5'",
       "b'0\\t0.467'", "b'0\\t0.857'", "b'0\\t0.5'", "b'0\\t0.357'",
       "b'0\\t0.533'", "b'5\\t0.467'", "b'5\\t0.467'", "b'5\\t0.125'",
       "b'5\\t0.4'", "b'5\\t0.214'", "b'5\\t0.4'", "b'10\\t0.067'",
       "b'10\\t0.067'", "b'10\\t0.333'", "b'10\\t0.333'", "b'10\\t0.133'",
       "b'10\\t0.133'", "b'15\\t0.267'", "b'15\\t0.286'", "b'15\\t0.333'",
       "b'15\\t0.214'", "b'15\\t0'", "b'15\\t0'", "b'20\\t0.267'",
       "b'20\\t0.2'", "b'20\\t0.267'", "b'20\\t0.437'", "b'20\\t0.077'",
       "b'20\\t0.067'", "b'25\\t0.133'", "b'25\\t0.267'", "b'25\\t0.412'",
       "b'25\\t0'", "b'25\\t0.067'", "b'25\\t0.133'", "b'30\\t0'",
       "b'30\\t0.071'", "b'30\\t0'", "b'30\\t0.067'", "b'30\\t0.067'",
       "b'30\\t0.133'"], 
      dtype='<U16')

3 个答案:

答案 0 :(得分:1)

你可以尝试一下:

data = np.loadtxt(file, delimiter="\tb'", dtype=str)

表示实际的分隔符似乎包含字符&#34; b&#39;&#34;?

答案 1 :(得分:1)

感谢所有看过我问题的人。我重新启动了IPython,现在能够毫无问题地执行相同的代码。这是与上面相同的代码。

data_float = np.loadtxt(file, delimiter='\t', dtype=float, skiprows=1)

结果:

In [1]: data_float
Out[1]: 
array([[  9.90000000e+01,   6.70000000e-02],
       [  9.90000000e+01,   1.33000000e-01],
       [  9.90000000e+01,   6.70000000e-02],
       [  9.90000000e+01,   0.00000000e+00],
       [  9.90000000e+01,   0.00000000e+00],
       [  0.00000000e+00,   5.00000000e-01],
       [  0.00000000e+00,   4.67000000e-01],
       [  0.00000000e+00,   8.57000000e-01],
       [  0.00000000e+00,   5.00000000e-01],
       [  0.00000000e+00,   3.57000000e-01],
       [  0.00000000e+00,   5.33000000e-01],
       [  5.00000000e+00,   4.67000000e-01],
       [  5.00000000e+00,   4.67000000e-01],
       [  5.00000000e+00,   1.25000000e-01],
       [  5.00000000e+00,   4.00000000e-01],
       [  5.00000000e+00,   2.14000000e-01],
       [  5.00000000e+00,   4.00000000e-01],
       [  1.00000000e+01,   6.70000000e-02],
       [  1.00000000e+01,   6.70000000e-02],
       [  1.00000000e+01,   3.33000000e-01],
       [  1.00000000e+01,   3.33000000e-01],
       [  1.00000000e+01,   1.33000000e-01],
       [  1.00000000e+01,   1.33000000e-01],
       [  1.50000000e+01,   2.67000000e-01],
       [  1.50000000e+01,   2.86000000e-01],
       [  1.50000000e+01,   3.33000000e-01],
       [  1.50000000e+01,   2.14000000e-01],
       [  1.50000000e+01,   0.00000000e+00],
       [  1.50000000e+01,   0.00000000e+00],
       [  2.00000000e+01,   2.67000000e-01],
       [  2.00000000e+01,   2.00000000e-01],
       [  2.00000000e+01,   2.67000000e-01],
       [  2.00000000e+01,   4.37000000e-01],
       [  2.00000000e+01,   7.70000000e-02],
       [  2.00000000e+01,   6.70000000e-02],
       [  2.50000000e+01,   1.33000000e-01],
       [  2.50000000e+01,   2.67000000e-01],
       [  2.50000000e+01,   4.12000000e-01],
       [  2.50000000e+01,   0.00000000e+00],
       [  2.50000000e+01,   6.70000000e-02],
       [  2.50000000e+01,   1.33000000e-01],
       [  3.00000000e+01,   0.00000000e+00],
       [  3.00000000e+01,   7.10000000e-02],
       [  3.00000000e+01,   0.00000000e+00],
       [  3.00000000e+01,   6.70000000e-02],
       [  3.00000000e+01,   6.70000000e-02],
       [  3.00000000e+01,   1.33000000e-01]])

答案 2 :(得分:1)

问题是你的号码被引用了。也就是说,该字段为'99',而不是99。有两种方法可以做到这一点。您可以提供转换器函数来删除引号并返回浮点数。或者,您可以使用csv模块加载数据,然后将该数据传递给numpy

使用转换器功能

import numpy as np
from io import StringIO

data = """'x'\t'y'
'1'\t'2.5'"""

arr = np.loadtxt(StringIO(data), dtype=float, delimiter="\t", skiprows=1, 
    converters=dict.fromkeys([0, 1], (lambda s: float(s.strip(b"'"))))
)

使用csv

import csv
import numpy as np
from io import StringIO

data = """'x'\t'y'
'1'\t'2.5'"""

reader = csv.reader(StringIO(data), quotechar="'", delimiter="\t")
next(reader) # skip headers
arr = np.array(list(reader), dtype=float)

在这两个示例中,我都使用了StringIO,因此您可以轻松查看“文件”的内容。您当然可以将文件名或文件对象传递给这些函数。