我尝试从平面文件导入文本并将其转换为单行内的浮点值。我看到this post有相同的错误,但我还没有找到输入文件中哪些字符无效。或者我有语法错误?
以字符串形式导入打印结果:
data = np.loadtxt(file, delimiter='\t', dtype=str)
print(data[0:2])
...
[["b'Time'" "b'Percent'"]
["b'99'" "b'0.067'"]]
尝试导入为float:
# Import data as floats and skip the first row: data_float
data_float = np.loadtxt(data, delimiter='\t', dtype=float, skiprows=1)
它会抛出以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
data_float = np.loadtxt(data, delimiter='\t', dtype=float, skiprows=1)
File "<stdin>", line 848, in loadtxt
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "<stdin>", line 848, in <listcomp>
items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: could not convert string to float: b'["b\'99\'" "b\'0.067\'"]'
顺便说一下,我还看到this post解释了b
字符,但我不认为这是问题。
第一个答案建议的其他问题排查步骤:
data = np.loadtxt(file, delimiter="\tb'", dtype=str)
返回:
array(["b'Time\\tPercent'", "b'99\\t0.067'", "b'99\\t0.133'",
"b'99\\t0.067'", "b'99\\t0'", "b'99\\t0'", "b'0\\t0.5'",
"b'0\\t0.467'", "b'0\\t0.857'", "b'0\\t0.5'", "b'0\\t0.357'",
"b'0\\t0.533'", "b'5\\t0.467'", "b'5\\t0.467'", "b'5\\t0.125'",
"b'5\\t0.4'", "b'5\\t0.214'", "b'5\\t0.4'", "b'10\\t0.067'",
"b'10\\t0.067'", "b'10\\t0.333'", "b'10\\t0.333'", "b'10\\t0.133'",
"b'10\\t0.133'", "b'15\\t0.267'", "b'15\\t0.286'", "b'15\\t0.333'",
"b'15\\t0.214'", "b'15\\t0'", "b'15\\t0'", "b'20\\t0.267'",
"b'20\\t0.2'", "b'20\\t0.267'", "b'20\\t0.437'", "b'20\\t0.077'",
"b'20\\t0.067'", "b'25\\t0.133'", "b'25\\t0.267'", "b'25\\t0.412'",
"b'25\\t0'", "b'25\\t0.067'", "b'25\\t0.133'", "b'30\\t0'",
"b'30\\t0.071'", "b'30\\t0'", "b'30\\t0.067'", "b'30\\t0.067'",
"b'30\\t0.133'"],
dtype='<U16')
答案 0 :(得分:1)
你可以尝试一下:
data = np.loadtxt(file, delimiter="\tb'", dtype=str)
表示实际的分隔符似乎包含字符&#34; b&#39;&#34;?
答案 1 :(得分:1)
感谢所有看过我问题的人。我重新启动了IPython,现在能够毫无问题地执行相同的代码。这是与上面相同的代码。
data_float = np.loadtxt(file, delimiter='\t', dtype=float, skiprows=1)
结果:
In [1]: data_float
Out[1]:
array([[ 9.90000000e+01, 6.70000000e-02],
[ 9.90000000e+01, 1.33000000e-01],
[ 9.90000000e+01, 6.70000000e-02],
[ 9.90000000e+01, 0.00000000e+00],
[ 9.90000000e+01, 0.00000000e+00],
[ 0.00000000e+00, 5.00000000e-01],
[ 0.00000000e+00, 4.67000000e-01],
[ 0.00000000e+00, 8.57000000e-01],
[ 0.00000000e+00, 5.00000000e-01],
[ 0.00000000e+00, 3.57000000e-01],
[ 0.00000000e+00, 5.33000000e-01],
[ 5.00000000e+00, 4.67000000e-01],
[ 5.00000000e+00, 4.67000000e-01],
[ 5.00000000e+00, 1.25000000e-01],
[ 5.00000000e+00, 4.00000000e-01],
[ 5.00000000e+00, 2.14000000e-01],
[ 5.00000000e+00, 4.00000000e-01],
[ 1.00000000e+01, 6.70000000e-02],
[ 1.00000000e+01, 6.70000000e-02],
[ 1.00000000e+01, 3.33000000e-01],
[ 1.00000000e+01, 3.33000000e-01],
[ 1.00000000e+01, 1.33000000e-01],
[ 1.00000000e+01, 1.33000000e-01],
[ 1.50000000e+01, 2.67000000e-01],
[ 1.50000000e+01, 2.86000000e-01],
[ 1.50000000e+01, 3.33000000e-01],
[ 1.50000000e+01, 2.14000000e-01],
[ 1.50000000e+01, 0.00000000e+00],
[ 1.50000000e+01, 0.00000000e+00],
[ 2.00000000e+01, 2.67000000e-01],
[ 2.00000000e+01, 2.00000000e-01],
[ 2.00000000e+01, 2.67000000e-01],
[ 2.00000000e+01, 4.37000000e-01],
[ 2.00000000e+01, 7.70000000e-02],
[ 2.00000000e+01, 6.70000000e-02],
[ 2.50000000e+01, 1.33000000e-01],
[ 2.50000000e+01, 2.67000000e-01],
[ 2.50000000e+01, 4.12000000e-01],
[ 2.50000000e+01, 0.00000000e+00],
[ 2.50000000e+01, 6.70000000e-02],
[ 2.50000000e+01, 1.33000000e-01],
[ 3.00000000e+01, 0.00000000e+00],
[ 3.00000000e+01, 7.10000000e-02],
[ 3.00000000e+01, 0.00000000e+00],
[ 3.00000000e+01, 6.70000000e-02],
[ 3.00000000e+01, 6.70000000e-02],
[ 3.00000000e+01, 1.33000000e-01]])
答案 2 :(得分:1)
问题是你的号码被引用了。也就是说,该字段为'99'
,而不是99
。有两种方法可以做到这一点。您可以提供转换器函数来删除引号并返回浮点数。或者,您可以使用csv
模块加载数据,然后将该数据传递给numpy
。
使用转换器功能
import numpy as np
from io import StringIO
data = """'x'\t'y'
'1'\t'2.5'"""
arr = np.loadtxt(StringIO(data), dtype=float, delimiter="\t", skiprows=1,
converters=dict.fromkeys([0, 1], (lambda s: float(s.strip(b"'"))))
)
使用csv
import csv
import numpy as np
from io import StringIO
data = """'x'\t'y'
'1'\t'2.5'"""
reader = csv.reader(StringIO(data), quotechar="'", delimiter="\t")
next(reader) # skip headers
arr = np.array(list(reader), dtype=float)
在这两个示例中,我都使用了StringIO
,因此您可以轻松查看“文件”的内容。您当然可以将文件名或文件对象传递给这些函数。