Question

我想阅读book-crossing dataset表：BX-Books。用熊猫。我写的时候：

  #load book informations dataset
books = pd.read_csv("BX-CSV-Dump/BX-Books.csv",sep=';')

我收到错误：

CParserError：标记数据时出错。 C错误：第6452行预计有8个字段，见9

如何纠正？我尝试使用'\ t'作为分隔符，但它也不起作用，在这种情况下，我将一列中的所有列分隔为“;”。

Answer 1

问题是由字符串引起的：

"Peterman Rides Again: Adventures Continue with the Real \"J. Peterman\" Through Life &amp; the Catalog Business"

注意：请注意&，其中包含;和\"J. Peterman\"，其中包含引号字符

所以试试这个：

In [34]: df = pd.read_csv(fn, sep=';', escapechar='\\', encoding='CP1252', 
                          low_memory=False)

In [35]: df.shape
Out[35]: (271379, 8)

CParserError：标记数据时出错。在阅读书籍交叉数据集时

1 个答案: