Python正确解析CSV

时间:2012-09-06 09:04:42

标签: python parsing csv

我是Python的新手。我想解析一个csv文件,以便识别引用的值 - 例如

  1997年,福特,E350,“超级豪华卡车”

应分为

  

('1997','福特','E350','超级豪华卡车')

而不是

  

('1997','福特','E350',''超级','豪华卡车'')

如果我使用像str.split(,)这样的东西,我就会得到上述内容。

我该怎么做? 最好将这些值存储在数组或其他数据结构中吗?因为在我从csv获取这些值后,我希望能够轻松选择,让我们说任意两列并将其存储为另一个数组或其他数据结构。

5 个答案:

答案 0 :(得分:23)

您应该使用csv模块:

import csv
reader = csv.reader(['1997,Ford,E350,"Super, luxurious truck"'], skipinitialspace=True)
for r in reader:
    print r

输出:

['1997', 'Ford', 'E350', 'Super, luxurious truck']

答案 1 :(得分:14)

以下方法运作良好

d = {}
d['column1name'] = []
d['column2name'] = []
d['column3name'] = []

dictReader = csv.DictReader(open('filename.csv', 'rb'), fieldnames = ['column1name', 'column2name', 'column3name'], delimiter = ',', quotechar = '"')

for row in dictReader:
    for key in row:
        d[key].append(row[key])

列存储在字典中,列名为键。

答案 2 :(得分:5)

您必须在quotechar声明中将双引号定义为csv.reader()

>>> with open(r'<path_to_csv_test_file>') as csv_file:
...     reader = csv.reader(csv_file, delimiter=',', quotechar='"')
...     print(reader.next())
... 
['1997', 'Ford', 'E350', 'Super, luxurious truck']
>>> 

答案 3 :(得分:4)

如果您不想使用CSV模块,则需要使用正则表达式。试试这个:

import re
array = re.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", '1997,Ford,E350,"Super, luxurious truck"')

如果您尝试:

print(array[3])

你会得到:

"Super, luxurious truck"

答案 4 :(得分:0)

csv.py模块可能很好-但是,如果您想查看和/或控制其工作方式,以下是一个基于协程的仅适用于python的小型解决方案:

def csv_parser(delimiter=','):
    field = []
    while True:
        char = (yield(''.join(field)))
        field = []

        leading_whitespace = []    
        while char and char == ' ':
            leading_whitespace.append(char)
            char = (yield)

        if char == '"' or char == "'":
            suround = char
            char = (yield)
            while True:
                if char == suround:
                    char = (yield)
                    if not char == suround:
                        break

                field.append(char)
                char = (yield)

            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                char = (yield)
        else:
            field = leading_whitespace
            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                field.append(char)
                char = (yield)

def parse_csv(csv_text):
    processor = csv_parser()
    processor.next() # start the processor coroutine

    split_result = []
    for c in list(csv_text) + [None]:
        emit = processor.send(c)
        if emit:
            split_result.append(emit)

    return split_result

print parse_csv('1997,Ford,E350,"Super, luxurious truck"')

在python 2.7上测试