我有一个文件,我希望使用CSV阅读器进行解析,它有12行,但有些列包含引号,并且使逗号和单引号和新行更复杂,问题是csv阅读器没有正确处理引号,引号内的引号被视为一个单独的实体,这里是我正在处理的一个小样本。
ptr = open("myfile")
text = ptr.read()
ptr.close()
for l in csv.reader(text, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True):
print l
该文件包含:
"0","11/21/2013","NEWYORK","USA
Atlantic ","the person replied \"this quote\" to which i was shocked,
this came as an utter surprise"
"1","10/18/2013","London","UK","please note the message \"next quote\"
is invalid"
"2","08/11/2014","Paris","France",
"the region is in a very important geo strategic importance"
答案 0 :(得分:8)
你必须在读者中设置escapechar:
csv.reader(..., escapechar='\\')
默认情况下为None
(不知道原因)。
第二件事是您错误地初始化了阅读器。您不会将字符串传递给阅读器,而是传递流:
with open("myfile") as fo:
reader = csv.reader(
fo,
quotechar='"',
delimiter=',',
quoting=csv.QUOTE_ALL,
skipinitialspace=True,
escapechar='\\'
)
for row in reader:
print row
答案 1 :(得分:1)
通过重新模块。
import re
import csv
with open('file') as f:
m = re.split(r'\n\n+', f.read())
for line in m:
print(re.findall(r'(?<!\\)"(?:\\"|[^"])*(?<!\\)"', line))
<强>输出:强>
['"0"', '"11/21/2013"', '"NEWYORK"', '"USA\n Atlantic "', '"the person replied \\"this quote\\" to which i was shocked,\nthis came as an utter surprise"']
['"1"', '"10/18/2013"', '"London"', '"UK"', '"please note the message \\"next quote\\" \nis invalid"']
['"2"', '"08/11/2014"', '"Paris"', '"France"', '"the region is in a very important geo strategic importance"']