您好,我试图在jupyter笔记本中执行一个包含txt文件的单元格,我做了这样的事情:
dataset = numpy.loadtxt("C:/Users/jayjay/learning/try.txt", delimiter=",", skiprows=1)
# split into input (X) and output (Y) variables
X=dataset[:100,2:4]
Y=dataset[:100,4]
当我尝试运行此代码时,出现此错误:
ValueError Traceback (most recent call last)
<ipython-input-64-d2d2260af43e> in <module>
----> 1 dataset = numpy.loadtxt("C:/Users/jayjay/learning/try.txt", delimiter=",", skiprows=1)
2 # split into input (X) and output (Y) variables
3 X=dataset[:100,2:4]
4 Y=dataset[:100,4]
ValueError: could not convert string to float: 'not 1'
在try.txt中,我有一个与此类似的数据:
135,10,125,10,1
230,16,214,19,not 1
226,16,210,19,1
231,16,215,19,not 1
205,16,189,17,not 1
如何解决此错误?我是一个自学的新手。有人可以帮我吗?
答案 0 :(得分:1)
使用熊猫读取文件:
df = pandas.read_csv(file, sep = ',')
numpydata = df.to_numpy() # will give a numpy array
答案 1 :(得分:1)
很高兴您提供了文件样本:
In [1]: txt="""135,10,125,10,1
...: 230,16,214,19,not 1
...: 226,16,210,19,1
...: 231,16,215,19,not 1
...: 205,16,189,17,not 1"""
loadtxt
接受字符串列表代替文件:
In [2]: np.loadtxt(txt.splitlines(),delimiter=',')
...
ValueError: could not convert string to float: 'not 1'
它试图返回一个float数组,但是not 1
字符串出现了问题:
genfromtxt
与之类似,但是在它可以创建浮点数时给出nan
:
In [3]: np.genfromtxt(txt.splitlines(),delimiter=',')
Out[3]:
array([[135., 10., 125., 10., 1.],
[230., 16., 214., 19., nan],
[226., 16., 210., 19., 1.],
[231., 16., 215., 19., nan],
[205., 16., 189., 17., nan]])
您可以跳过问题列:
In [4]: np.loadtxt(txt.splitlines(),delimiter=',', usecols=[0,1,2,3])
Out[4]:
array([[135., 10., 125., 10.],
[230., 16., 214., 19.],
[226., 16., 210., 19.],
[231., 16., 215., 19.],
[205., 16., 189., 17.]])
或者由于您仍然要将数组分为两个数组:
In [8]: np.genfromtxt(txt.splitlines(),delimiter=',', usecols=[0,1,2,3], dtype=int)
Out[8]:
array([[135, 10, 125, 10],
[230, 16, 214, 19],
[226, 16, 210, 19],
[231, 16, 215, 19],
[205, 16, 189, 17]])
In [9]: np.genfromtxt(txt.splitlines(),delimiter=',', usecols=[4], dtype=None, encoding=None)
Out[9]: array(['1', 'not 1', '1', 'not 1', 'not 1'], dtype='<U5')
dtype=None
可以为每列选择适当的dtype。
In [10]: np.genfromtxt(txt.splitlines(),delimiter=',', dtype=None, encoding=N
...: one)
Out[10]:
array([(135, 10, 125, 10, '1'), (230, 16, 214, 19, 'not 1'),
(226, 16, 210, 19, '1'), (231, 16, 215, 19, 'not 1'),
(205, 16, 189, 17, 'not 1')],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<U5')])
这是一个结构化的数组,每列带有field
。并使用更高级的dtype规范:
In [13]: np.genfromtxt(txt.splitlines(),delimiter=',', dtype='4i,U5', encoding=None)
Out[13]:
array([([135, 10, 125, 10], '1'), ([230, 16, 214, 19], 'not 1'),
([226, 16, 210, 19], '1'), ([231, 16, 215, 19], 'not 1'),
([205, 16, 189, 17], 'not 1')],
dtype=[('f0', '<i4', (4,)), ('f1', '<U5')])
In [14]: _['f0']
Out[14]:
array([[135, 10, 125, 10],
[230, 16, 214, 19],
[226, 16, 210, 19],
[231, 16, 215, 19],
[205, 16, 189, 17]], dtype=int32)
In [15]: __['f1']
Out[15]: array(['1', 'not 1', '1', 'not 1', 'not 1'], dtype='<U5')
到目前为止,我还没有尝试解析或转换那些“非1”字符串。我们可以构造一个converter
并将其转换为数字,例如0。
如果我定义了转换器函数,例如:
def foo(astr):
if astr==b'not 1':
astr = b'0'
return int(astr)
In [31]: np.genfromtxt(txt.splitlines(),delimiter=',', converters={4:foo}, dtype=int)
Out[31]:
array([[135, 10, 125, 10, 1],
[230, 16, 214, 19, 0],
[226, 16, 210, 19, 1],
[231, 16, 215, 19, 0],
[205, 16, 189, 17, 0]])
或者如果转换器返回浮点数:
def foo(astr):
if astr==b'not 1':
astr = b'0'
return float(astr)
In [39]: np.genfromtxt(txt.splitlines(),delimiter=',', converters={4:foo})
Out[39]:
array([[135., 10., 125., 10., 1.],
[230., 16., 214., 19., 0.],
[226., 16., 210., 19., 1.],
[231., 16., 215., 19., 0.],
[205., 16., 189., 17., 0.]])