Question

看了here和here接近核心问题，我相信我已经看到但是以其他方式得到修复。

我正在尝试解析一个CSV，它有一个字段，现在需要一个逗号，要求我们用引号括起该字段。它是引号中唯一的字段。

我们的分隔符（sep）是逗号，我们现在添加引号的字符串分隔符（quotechar）。

我把它归结为此。在我看来，sep和quotechar应用程序的顺序是关键问题，导致使用其中带有sep的quotechar的行将永远不会工作。

最后一行注释掉的数据文件。

$ cat simple.csv
column1,column2, column3
one,    two,                three
one,    two,               "three"
#one,    "two, two_again",   three
$

代码：

df = pd.read_csv( simple_file, sep=',', header=0, comment='#', quotechar='"')
print df

输出：

column1  column2                  column3
0     one      two                    three
1     one      two                 "three"

现在，在引用的字符串中添加包含sep char的最后一行。

数据文件：

$ cat simple.csv
column1,column2, column3
one,    two,                three
one,    two,               "three"
one,    "two, two_again",   three
$

输出失败：

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:22649)()
CParserError: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4

我相信我想强制Pandas首先在每一行使用引号分隔符，然后使用分隔符，因为它正好相反。似乎无法弄清楚如何。有没有办法告诉熊猫我无法找到？

Answer 1

pandas CSV阅读器很混乱，因为你告诉它分隔符是严格的＆＃39;，＆＃39;但您也在数据文件中使用空格作为分隔符。更改分隔符或修复数据。数据为

column1,column2, column3
one,two,three
one,two,"three"
one,"two, two_again",three

你得到以下

import pandas as pd
print(pd.read_csv("data.csv", header=None))

         0               1         2
0  column1         column2   column3
1      one             two     three
2      one             two     three
3      one  two, two_again     three

Pandas read_csv（）与sep和quotechar冲突导致意外的列数

1 个答案: