python csv阅读器不处理引号

时间:2015-03-25 11:12:43

标签: python csv

我有一个文件,我希望使用CSV阅读器进行解析,它有12行,但有些列包含引号,并且使逗号和单引号和新行更复杂,问题是csv阅读器没有正确处理引号,引号内的引号被视为一个单独的实体,这里是我正在处理的一个小样本。

ptr = open("myfile")
text = ptr.read()
ptr.close() 

for l in  csv.reader(text, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l

该文件包含:

"0","11/21/2013","NEWYORK","USA
 Atlantic ","the person replied \"this quote\" to which i was shocked,
this came as an utter surprise"

"1","10/18/2013","London","UK","please note the message \"next quote\" 
is invalid"

"2","08/11/2014","Paris","France",
"the region is in a very important geo strategic importance"

2 个答案:

答案 0 :(得分:8)

你必须在读者中设置escapechar:

csv.reader(..., escapechar='\\')

默认情况下为None(不知道原因)。

第二件事是您错误地初始化了阅读器。您不会将字符串传递给阅读器,而是传递流:

with open("myfile") as fo:
    reader = csv.reader(
        fo,
        quotechar='"',
        delimiter=',',
        quoting=csv.QUOTE_ALL,
        skipinitialspace=True,
        escapechar='\\'
    )

    for row in reader:
        print row

答案 1 :(得分:1)

通过重新模块。

import re
import csv
with open('file') as f:
    m = re.split(r'\n\n+', f.read())
    for line in m:
        print(re.findall(r'(?<!\\)"(?:\\"|[^"])*(?<!\\)"', line))

<强>输出:

['"0"', '"11/21/2013"', '"NEWYORK"', '"USA\n Atlantic "', '"the person replied \\"this quote\\" to which i was shocked,\nthis came as an utter surprise"']
['"1"', '"10/18/2013"', '"London"', '"UK"', '"please note the message \\"next quote\\" \nis invalid"']
['"2"', '"08/11/2014"', '"Paris"', '"France"', '"the region is in a very important geo strategic importance"']