python csv处理列内的逗号

时间:2018-05-03 15:35:52

标签: python pandas csv

处理包含小说文本数据的csv文件。

book_id, title, content
1, book title 1, All Passion Spent is written in three parts, primarily from the view of an intimate observer. 
2, Book Title 2,  In particular Mr FitzGeorge, a forgotten acquaintance from India who has ever since been in love with her, introduces himself and they form a quiet but playful and understanding friendship. It cost 3,4234 to travel. 

内容列中的文字有逗号,不幸的是,当您尝试使用pandas.read_csv时,您会得到pandas.errors.ParserError: Error tokenizing data. C error:

这个问题有一些解决方案,但它们都没有奏效。试图读取为常规文件,然后传递给数据帧失败。 SO - Solution

1 个答案:

答案 0 :(得分:1)

您可以尝试阅读文件,然后使用str.split(",", 2)拆分内容,然后将结果转换为DF。

<强>实施例

import pandas as pd
content = []
with open(filename, "r") as infile:
    header = infile.readline().strip().split(",")
    content = [i.strip().split(",", 2) for i in infile.readlines()]

df = pd.DataFrame(content, columns=header)
print(df)

<强>输出:

  book_id          title                                            content
0       1   book title 1   All Passion Spent is written in three parts, ...
1       2   Book Title 2    In particular Mr FitzGeorge, a forgotten acq...