Question

我希望阅读CSV文件并在Python中创建一个对象来存储大型数据集。数据位于CSV文件（带标题）中，每行中的前两个条目表示X，Y坐标。稍后在程序中，我将对每个X，Y坐标的数据进行排序和执行操作。

此处的示例数据：

x, y, field1, field2, field3
1, 2, 10, 20, 30
1, 2, 20, 30 40
7, 4, 2, 49, 39

我认为我想要从中创建的对象如下所示：

位置，值

(1,2) => {field1=10,field2=20,field3=30},{field1=20,field2=30,field3=40}
(7,4) => {field1=2,field2=49,field3=39}

这是一个带有元组键的字典中的字典吗？我一直在网上搜索这个例子并且很难找到它。以这种方式处理数据是否有意义？

到目前为止，我一直试图将数据放入一个字典，但我遇到了麻烦。下面的代码只打印标题：

import csv
import sys

dict={}

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)
    for row in data:
        for i in range(length):
            dict[headers[i]]=row[i]

for x in dict:
    print x

Answer 1

import csv

# let's create a class to hold the data in each line
class Capsule:
    def __init__(x,y,f1,f2,f3):
        self.x = x
        self.y = y
        self.field1 = f1
        self.field2 = f2
        self.field3 = f3

# let's read the file
with open('/path/to/file') as infile:
    infile.readline()
    capsules = []
    for x, y, f1, f2, f3 in csv.reader(infile):
        capsules.append(Capsule(x,y,f1,f2,f3))


# done reading all data
# let's sort the list by x,y coordinates
capsules.sort(key=lambda c : (c.x, c.y))

列表的这种用法有助于对事物进行排序等。但是，如果您有兴趣了解特定坐标集中的对象是什么，那么您最好使用字典：

with open('/path/to/file') as infile:
    infile.readline()
    capsules = {}
    for x, y, f1, f2, f3 in csv.reader(infile):
        if (x,y) not in capsules:
            capsules[(x,y)] = []
        capsules[(x,y)].append(Capsule(x,y,f1,f2,f3))

# sort by x,y coordinates:
sortedCapsules = [capsules[k] for k in sorted(capsules)]

Answer 2

我认为这段代码会有所帮助

import csv
import sys

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)

    res = dict()
    for row in data:

        fields = dict()
        for i in range(2,length):
            fields[headers[i]]=int(row[i])
        res[(int(row[0]),int(row[1]))] = fields

for x in res:
    print x,res[x]

Answer 3

假设您的csv结构已知且已修复：

import csv
import sys
from collections import defaultdict

HEADERS = ["x", "y", "field1", "field2", "field3"]

def read_data(source):
    data = defaultdict(list)
    reader = csv.DictReader(source, fieldnames=HEADERS)
    next(reader) # skip headers
    for row in reader:
        # this will at once build the key tuple
        # and remove the "x" and "y" keys from the 
        # row dict
        key = row.pop("x"), row.pop("y")
        data[key].append(row)
    return data

with open('data.csv') as source:
    data = read_data(source)

print data

作为旁注：不要使用dict或file作为var名称，特别是在顶层，因为它会遮蔽内置dict和{{1类型。

在Python中用CSV创建字典（或元组？）字典

3 个答案: