我有一个.csv文件,所有字段用双引号分隔,但是有些字段中有随机双引号/ UPDATE这有点偏,我包括两行,第二行是个问题。在原文中,我最后没有双引号,这是第一个解决方案的问题,否则会起作用,但在/ n之前删除引号:
"20135025373","25","2013-08-24 00:00:00","WOOD","CHRISTY","","","2679 W. LONG CIRCLE","","LITTLETON","CO","80120","","3510862","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""
“20135025373”,“10”,“2013-08-24 00:00:00”,“DAVIS”,“JOHN”,“”,“”,“2822 THIRD”“,”“,”BOULDER“ ,“CO”,“80304”,“”,“3510863”,“2013-09-03 00:00:00”,“货币(分项)”,“信用卡/借记卡”,“个人”,“”, “问题委员会”,“约翰·莫尔斯的很多人”,“”,“”,“”,“N”,“N”,“0”,“STATEWIDE”,“”
我尝试了这段代码,但它也删除了行开头和结尾的引号。
import re
with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
new.writelines(re.sub(r'(?<!,)"(?!,)', '', line) for line in old)
任何想法都表示赞赏!
答案 0 :(得分:1)
如果您可以使用csv
模块,请先查看Removing in-field quotes in csv file。
如果你想通过使用正则表达式来做到这一点,我想这就足够了。
re.sub(r'(?<=[^,])"(?=[^,])', '', line)
见工作Demo
答案 1 :(得分:0)
如果您不想在该行的开头和结尾匹配引号,则可以使用此正则表达式:
(?<!,|^)\"(?!,|$)
而不是:
(?<!,)"(?!,)
在此处查看演示:http://regex101.com/r/cI7mW5
答案 2 :(得分:0)
您可以使用csv
模块而不是re
吗?它可能已经内置了这种智能。
csv
我生气了。以下代码未经过测试,但可能会为您提供一个起点。
import csv
with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
reader = csv.reader(old, delimiter = ','; quotechar = '"')
new.writelines(row) for row in reader