Question

我有一个csv文件。每列代表一个参数，并包含重复数百次的少量值（例如，1,2,3,5）。我想编写一个python程序，它读取每一列并在字典{column_header：list_numbers}中存储其内容（不重复数字）。

我尝试调整example given in the python documentation：

def getlist(file):
    content = dict()
    with open(file, newline = '') as inp:
        my_reader = reader(inp, delimiter = ' ')
        for col in zip(*my_reader):
            l = []
            for k in col:
                if k not in l:
                    l.append(k)
                print(k)    # for debugging purposes
            content[col[0]] = l

我期望通过打印k来查看列的每个元素。相反，我一次得到几列。

对于什么是错误的任何想法？

Answer 1

看起来你几乎就在那里。我使用set来检测重复的数字（效率更高）：

def getlist(file):
    content = {}
    with open(file, newline = '') as inp:
        my_reader = reader(inp, delimiter = ' ')
        for col in zip(*my_reader):
            content[col[0]] = l = []
            seen = set()
            for k in col[1:]:
                if k not in seen:
                    l.append(k)
                    seen.add(k)
    return content

确保你的分隔符合适;如果以上内容对您不起作用，则print()可能会显示整个行，其中的分隔符仍在其中，作为字符串。

说，您的文件使用,作为分隔符，输出看起来像：

{'a,b,c,d': ['0,1,2,3', '1,2,3,4']}

配置正确的分隔符时会给你：

{'d': ['3', '4'], 'c': ['2', '3'], 'b': ['1', '2'], 'a': ['0', '1']}

Answer 2

以下python脚本是否适合您？

import csv
test_file = 'test.csv'
csv_file = csv.DictReader(open(test_file, 'rb'), delimiter=',')

for line in csv_file:
    print line['x']

获取csv文件的元素

2 个答案: