我正在尝试将此csv文件分成2D列表。我的代码目前的问题是它在数据中用引号切断了几行字段。那里有引号表示其中的逗号不是字段逗号分隔的一部分,实际上是该字段的一部分。我发布了代码,示例数据和示例输出。由于引号,您可以看到第一个输出行与其余字段相比如何跳过几个字段。我需要对正则表达式行做什么?在此先感谢您的帮助。
以下是代码的一部分:
import sys
import re
import time
# get the date
date = time.strftime("%x")
# function for reading in each line of file
# returns array of each line
def readIn(file):
array = []
for line in file:
array.append(line)
return array
def main():
data = open(sys.argv[1], "r")
template = open(sys.argv[2], "r")
output = open(sys.argv[3], "w")
finalL = []
dataL = []
dataL = readIn(data)
templateL = []
templateL = readIn(template)
costY = 0
dateStr = ""
# split each line in the data by the comma unless there are quotes
for i in range(0, len(dataL)):
if '"' in dataL[i]:
Pattern = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
dataL[i] = Pattern.split(dataL[i])[1::2]
for j in range(0, len(dataL[i])):
dataL[i][j] = dataL[i][j].strip()
else:
temp = dataL[i].strip().split(",")
dataL[i] = temp
数据示例:
OrgLevel3: ATHLET ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
ATHLET Direct,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
Grand Total for ATHLET:,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
OrgLevel3: BOOK ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
BOOK Direct,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
Grand Total for BOOK:,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
OrgLevel3: CARD ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
CARD Direct,,,253 ,,09:02:54,14.30 ,,
,,,,,,,,
Grand Total for CARD:,,,253 ,,09:02:54,14.30 ,,
示例输出:
['Grand Total for ATHLET:', '"1,312 "', '62:58:18', '130.62', '']
['Grand Total for BOOK:', '', '', '434 ', '', '14:59:18', '28.09 ', '', '']
['Grand Total for CARD:', '', '', '253 ', '', '09:02:54', '14.30 ', '', '']
答案 0 :(得分:0)
如果您尝试将CSV加载到列表中,那么您执行此操作的完整代码是:
import csv
with open(sys.argv[1]) as data:
dataL = list(csv.reader(data))
如果您的示例数据是您的输入数据,那么它需要事先做其他工作......,例如:
dataL = [row for row in csv.reader(data) if row[0].startswith('Grand Total for')]