我是Python的新手。我想解析一个csv文件,以便识别引用的值 - 例如
1997年,福特,E350,“超级豪华卡车”
应分为
('1997','福特','E350','超级豪华卡车')
而不是
('1997','福特','E350',''超级','豪华卡车'')
如果我使用像str.split(,)
这样的东西,我就会得到上述内容。
我该怎么做? 最好将这些值存储在数组或其他数据结构中吗?因为在我从csv获取这些值后,我希望能够轻松选择,让我们说任意两列并将其存储为另一个数组或其他数据结构。
答案 0 :(得分:23)
您应该使用csv
模块:
import csv
reader = csv.reader(['1997,Ford,E350,"Super, luxurious truck"'], skipinitialspace=True)
for r in reader:
print r
输出:
['1997', 'Ford', 'E350', 'Super, luxurious truck']
答案 1 :(得分:14)
以下方法运作良好
d = {}
d['column1name'] = []
d['column2name'] = []
d['column3name'] = []
dictReader = csv.DictReader(open('filename.csv', 'rb'), fieldnames = ['column1name', 'column2name', 'column3name'], delimiter = ',', quotechar = '"')
for row in dictReader:
for key in row:
d[key].append(row[key])
列存储在字典中,列名为键。
答案 2 :(得分:5)
您必须在quotechar
声明中将双引号定义为csv.reader()
:
>>> with open(r'<path_to_csv_test_file>') as csv_file:
... reader = csv.reader(csv_file, delimiter=',', quotechar='"')
... print(reader.next())
...
['1997', 'Ford', 'E350', 'Super, luxurious truck']
>>>
答案 3 :(得分:4)
如果您不想使用CSV模块,则需要使用正则表达式。试试这个:
import re
array = re.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", '1997,Ford,E350,"Super, luxurious truck"')
如果您尝试:
print(array[3])
你会得到:
"Super, luxurious truck"
答案 4 :(得分:0)
csv.py模块可能很好-但是,如果您想查看和/或控制其工作方式,以下是一个基于协程的仅适用于python的小型解决方案:
def csv_parser(delimiter=','):
field = []
while True:
char = (yield(''.join(field)))
field = []
leading_whitespace = []
while char and char == ' ':
leading_whitespace.append(char)
char = (yield)
if char == '"' or char == "'":
suround = char
char = (yield)
while True:
if char == suround:
char = (yield)
if not char == suround:
break
field.append(char)
char = (yield)
while not char == delimiter:
if char == None:
(yield(''.join(field)))
char = (yield)
else:
field = leading_whitespace
while not char == delimiter:
if char == None:
(yield(''.join(field)))
field.append(char)
char = (yield)
def parse_csv(csv_text):
processor = csv_parser()
processor.next() # start the processor coroutine
split_result = []
for c in list(csv_text) + [None]:
emit = processor.send(c)
if emit:
split_result.append(emit)
return split_result
print parse_csv('1997,Ford,E350,"Super, luxurious truck"')
在python 2.7上测试