Python正则表达式从csv文件中获取一些/不是所有引号

时间:2014-05-01 14:42:33

标签: python regex csv double-quotes

我有一个.csv文件,所有字段用双引号分隔,但是有些字段中有随机双引号/ UPDATE这有点偏,我包括两行,第二行是个问题。在原文中,我最后没有双引号,这是第一个解决方案的问题,否则会起作用,但在/ n之前删除引号:

"20135025373","25","2013-08-24 00:00:00","WOOD","CHRISTY","","","2679 W. LONG CIRCLE","","LITTLETON","CO","80120","","3510862","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""

“20135025373”,“10”,“2013-08-24 00:00:00”,“DAVIS”,“JOHN”,“”,“”,“2822 THIRD”“,”“,”BOULDER“ ,“CO”,“80304”,“”,“3510863”,“2013-09-03 00:00:00”,“货币(分项)”,“信用卡/借记卡”,“个人”,“”, “问题委员会”,“约翰·莫尔斯的很多人”,“”,“”,“”,“N”,“N”,“0”,“STATEWIDE”,“”

我尝试了这段代码,但它也删除了行开头和结尾的引号。

import re

with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
    new.writelines(re.sub(r'(?<!,)"(?!,)', '', line) for line in old)

任何想法都表示赞赏!

3 个答案:

答案 0 :(得分:1)

如果您可以使用csv模块,请先查看Removing in-field quotes in csv file

如果你想通过使用正则表达式来做到这一点,我想这就足够了。

re.sub(r'(?<=[^,])"(?=[^,])', '', line)

见工作Demo

答案 1 :(得分:0)

如果您不想在该行的开头和结尾匹配引号,则可以使用此正则表达式:

(?<!,|^)\"(?!,|$)

而不是:

(?<!,)"(?!,)

在此处查看演示:http://regex101.com/r/cI7mW5

答案 2 :(得分:0)

您可以使用csv模块而不是re吗?它可能已经内置了这种智能。

csv我生气了。以下代码未经过测试,但可能会为您提供一个起点。

import csv

with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
    reader = csv.reader(old, delimiter = ','; quotechar = '"')
    new.writelines(row) for row in reader    

参考:https://docs.python.org/2/library/csv.html