我正在尝试通过python中的pandas的read_csv读取文本文件。我的文本文件看起来像(数字中的所有值):
35 61 7 1 0 # with leading white spaces
0 1 1 1 1 1 # with leading white spaces
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
我的python代码如下:
import pandas as pd
df = pd.read_csv('example.txt', header=None)
df
输出如下:
CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
在处理前导空格之前,我需要处理“标记数据”错误。'问题第一。所以我改变了代码:
import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df
我可以按照预期获得带有前导空格的数据,但第5行中的数据已经消失。输出如下:
b'Skipping line 5: expected 1 fields, saw 3\n
35 61 7 1 0 # with leading white spaces as intended
0 1 1 1 1 1 # with leading white spaces as intended
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
# 5th line disappeared (not my intention).
所以我尝试将下面的代码更改为第5行。
import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df
我在第5行成功获得了数据,但第1行和第2行的前导空格如下:
35 61 7 1 0 # without leading white spaces(not my intention)
0 1 1 1 1 1 # without leading white spaces(not my intention)
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.
我看到几个关于用字符串保留前导空格的帖子,但我找不到用数字保留前导空格的情况。谢谢你的帮助。
答案 0 :(得分:3)
键位于分隔符中。如果您将sep
指定为正则表达式^
行首字符元字符,则可以正常工作。
s = pd.read_csv('example.txt', header=None, sep='^', squeeze=True)
s
0 35 61 7 1 0
1 0 1 1 1 1 1
2 33 221 22 0 1
3 233 2
4 1(01-02),2(02-03),3(03-04)
Name: 0, dtype: object
s[1]
' 0 1 1 1 1 1'