Question

我试图读取一个文件，该文件在行（::）中使用两个冒号来分隔字段，并使用管道来分隔记录。因此，datafile test.txt可能如下所示：

testcol1::testcol2|testdata1::testdata2

我的代码如下：

pd.read_table('test.txt', sep='::', lineterminator='|')

这会产生以下警告：

C:\Users\jordan\AppData\Local\Enthought\Canopy\User\lib\site-packages\ipykernel\__main__.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.

以下＆＃34;解析＆＃34;数据：

testcol1   testcol2|testdata1   testdata2

...有三列，一个标题行和零数据行。如果我添加 engine = c kwarg，我会收到以下错误：

ValueError: the 'c' engine does not support regex separators

似乎Python认为我的 :: 字段分隔符是一个正则表达式模式，因此迫使我使用Python解析器，它不支持 lineterminator kwarg 。如何告诉pandas使用c解析器，并为我的字段分隔符进行简单的字符串匹配而不是正则表达式匹配？

Answer 1

您可以使用c引擎读取文件，这将更快，因此您可以使用lineterminator参数，然后使用矢量化str.split将列和数据拆分为后处理步骤：

In [20]:
import pandas as pd
import io
t="""testcol1::testcol2|testdata1::testdata2"""
df = pd.read_csv(io.StringIO(t),  lineterminator=r'|')
df

Out[20]:
     testcol1::testcol2
0  testdata1::testdata2

In [37]:
df1 = df['testcol1::testcol2'].str.split('::', expand=True)
df1.columns = list(df.columns.str.split('::', expand=True)[0])
df1

Out[37]:
    testcol1   testcol2
0  testdata1  testdata2

设置pandas.read_table字段和＆amp;记录分隔符

1 个答案: