Question

我有一个由Python脚本生成的巨大CSV。一些单元格包括数据数组，而其他单元格包括单项数组。一些例子：

cell01 == ['"July, 2002"', 'CUREE Publication No. CEA-01.', 'Project No. 3126', 'Prepared for Consortium of Universities for Research in Earthquake Engineering.']
cell02 == ['[Memorandum from Ralph J. Johnson on Andy Place].']
cell03 == ["Financial statements for the years ended March 31, 1991 and 1990 and independent auditors' report"]

理想情况下，我想将所有这些数据解析为如下所示的结构：

cell01_parsed[0] == '"July, 2002"'
cell01_parsed[1] == 'CUREE Publication No. CEA-01.'
cell01_parsed[2] == 'Project No. 3126'
cell01_parsed[3] == 'Prepared for Consortium of Universities for Research in Earthquake Engineering.'

cell02_parsed == '[Memorandum from Ralph J. Johnson on Andy Place].'

cell03_parsed == 'Financial statements for the years ended March 31, 1991 and 1990 and independent auditors\' report'

但是，当我使用csv.reader()或csv.DictReader()时，这些行将被解析为字符串，而不是数组。什么是一个简单的方法来做到这一点？我无法使用split(',')，因为某些字符串在项目中间有逗号。

Answer 1

您可以尝试通过正则表达式分割字符串（找出适合您数据的字符串），如下所示：

import re
test_str = '"July, 2002", CUREE Publication No. CEA-01.' 
re.compile(',(?!.+\")').split(test_str)

如何从数组的字符串表示重建数组？

1 个答案: