我有一个由Python脚本生成的巨大CSV。一些单元格包括数据数组,而其他单元格包括单项数组。一些例子:
cell01 == ['"July, 2002"', 'CUREE Publication No. CEA-01.', 'Project No. 3126', 'Prepared for Consortium of Universities for Research in Earthquake Engineering.']
cell02 == ['[Memorandum from Ralph J. Johnson on Andy Place].']
cell03 == ["Financial statements for the years ended March 31, 1991 and 1990 and independent auditors' report"]
理想情况下,我想将所有这些数据解析为如下所示的结构:
cell01_parsed[0] == '"July, 2002"'
cell01_parsed[1] == 'CUREE Publication No. CEA-01.'
cell01_parsed[2] == 'Project No. 3126'
cell01_parsed[3] == 'Prepared for Consortium of Universities for Research in Earthquake Engineering.'
cell02_parsed == '[Memorandum from Ralph J. Johnson on Andy Place].'
cell03_parsed == 'Financial statements for the years ended March 31, 1991 and 1990 and independent auditors\' report'
但是,当我使用csv.reader()
或csv.DictReader()
时,这些行将被解析为字符串,而不是数组。什么是一个简单的方法来做到这一点?我无法使用split(',')
,因为某些字符串在项目中间有逗号。
答案 0 :(得分:0)
您可以尝试通过正则表达式分割字符串(找出适合您数据的字符串),如下所示:
import re
test_str = '"July, 2002", CUREE Publication No. CEA-01.'
re.compile(',(?!.+\")').split(test_str)