在Python中用CSV创建字典(或元组?)字典

时间:2014-12-18 13:11:42

标签: python csv dictionary tuples

我希望阅读CSV文件并在Python中创建一个对象来存储大型数据集。数据位于CSV文件(带标题)中,每行中的前两个条目表示X,Y坐标。稍后在程序中,我将对每个X,Y坐标的数据进行排序和执行操作。

此处的示例数据:

x, y, field1, field2, field3
1, 2, 10, 20, 30
1, 2, 20, 30 40
7, 4, 2, 49, 39

我认为我想要从中创建的对象如下所示:

位置,值

(1,2) => {field1=10,field2=20,field3=30},{field1=20,field2=30,field3=40}
(7,4) => {field1=2,field2=49,field3=39}

这是一个带有元组键的字典中的字典吗?我一直在网上搜索这个例子并且很难找到它。以这种方式处理数据是否有意义?

到目前为止,我一直试图将数据放入一个字典,但我遇到了麻烦。下面的代码只打印标题:

import csv
import sys

dict={}

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)
    for row in data:
        for i in range(length):
            dict[headers[i]]=row[i]

for x in dict:
    print x

3 个答案:

答案 0 :(得分:1)

import csv

# let's create a class to hold the data in each line
class Capsule:
    def __init__(x,y,f1,f2,f3):
        self.x = x
        self.y = y
        self.field1 = f1
        self.field2 = f2
        self.field3 = f3

# let's read the file
with open('/path/to/file') as infile:
    infile.readline()
    capsules = []
    for x, y, f1, f2, f3 in csv.reader(infile):
        capsules.append(Capsule(x,y,f1,f2,f3))


# done reading all data
# let's sort the list by x,y coordinates
capsules.sort(key=lambda c : (c.x, c.y))

列表的这种用法有助于对事物进行排序等。但是,如果您有兴趣了解特定坐标集中的对象是什么,那么您最好使用字典:

with open('/path/to/file') as infile:
    infile.readline()
    capsules = {}
    for x, y, f1, f2, f3 in csv.reader(infile):
        if (x,y) not in capsules:
            capsules[(x,y)] = []
        capsules[(x,y)].append(Capsule(x,y,f1,f2,f3))

# sort by x,y coordinates:
sortedCapsules = [capsules[k] for k in sorted(capsules)]

答案 1 :(得分:0)

我认为这段代码会有所帮助

import csv
import sys

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)

    res = dict()
    for row in data:

        fields = dict()
        for i in range(2,length):
            fields[headers[i]]=int(row[i])
        res[(int(row[0]),int(row[1]))] = fields

for x in res:
    print x,res[x]

答案 2 :(得分:0)

假设您的csv结构已知且已修复:

import csv
import sys
from collections import defaultdict

HEADERS = ["x", "y", "field1", "field2", "field3"]

def read_data(source):
    data = defaultdict(list)
    reader = csv.DictReader(source, fieldnames=HEADERS)
    next(reader) # skip headers
    for row in reader:
        # this will at once build the key tuple
        # and remove the "x" and "y" keys from the 
        # row dict
        key = row.pop("x"), row.pop("y")
        data[key].append(row)
    return data

with open('data.csv') as source:
    data = read_data(source)

print data

作为旁注:不要使用dictfile作为var名称,特别是在顶层,因为它会遮蔽内置dict和{{1类型。