我希望阅读CSV文件并在Python中创建一个对象来存储大型数据集。数据位于CSV文件(带标题)中,每行中的前两个条目表示X,Y坐标。稍后在程序中,我将对每个X,Y坐标的数据进行排序和执行操作。
此处的示例数据:
x, y, field1, field2, field3
1, 2, 10, 20, 30
1, 2, 20, 30 40
7, 4, 2, 49, 39
我认为我想要从中创建的对象如下所示:
位置,值
(1,2) => {field1=10,field2=20,field3=30},{field1=20,field2=30,field3=40}
(7,4) => {field1=2,field2=49,field3=39}
这是一个带有元组键的字典中的字典吗?我一直在网上搜索这个例子并且很难找到它。以这种方式处理数据是否有意义?
到目前为止,我一直试图将数据放入一个字典,但我遇到了麻烦。下面的代码只打印标题:
import csv
import sys
dict={}
with open('data.csv') as file:
data = csv.reader(file)
headers = next(data)[0:]
length = len(headers)
for row in data:
for i in range(length):
dict[headers[i]]=row[i]
for x in dict:
print x
答案 0 :(得分:1)
import csv
# let's create a class to hold the data in each line
class Capsule:
def __init__(x,y,f1,f2,f3):
self.x = x
self.y = y
self.field1 = f1
self.field2 = f2
self.field3 = f3
# let's read the file
with open('/path/to/file') as infile:
infile.readline()
capsules = []
for x, y, f1, f2, f3 in csv.reader(infile):
capsules.append(Capsule(x,y,f1,f2,f3))
# done reading all data
# let's sort the list by x,y coordinates
capsules.sort(key=lambda c : (c.x, c.y))
列表的这种用法有助于对事物进行排序等。但是,如果您有兴趣了解特定坐标集中的对象是什么,那么您最好使用字典:
with open('/path/to/file') as infile:
infile.readline()
capsules = {}
for x, y, f1, f2, f3 in csv.reader(infile):
if (x,y) not in capsules:
capsules[(x,y)] = []
capsules[(x,y)].append(Capsule(x,y,f1,f2,f3))
# sort by x,y coordinates:
sortedCapsules = [capsules[k] for k in sorted(capsules)]
答案 1 :(得分:0)
我认为这段代码会有所帮助
import csv
import sys
with open('data.csv') as file:
data = csv.reader(file)
headers = next(data)[0:]
length = len(headers)
res = dict()
for row in data:
fields = dict()
for i in range(2,length):
fields[headers[i]]=int(row[i])
res[(int(row[0]),int(row[1]))] = fields
for x in res:
print x,res[x]
答案 2 :(得分:0)
假设您的csv结构已知且已修复:
import csv
import sys
from collections import defaultdict
HEADERS = ["x", "y", "field1", "field2", "field3"]
def read_data(source):
data = defaultdict(list)
reader = csv.DictReader(source, fieldnames=HEADERS)
next(reader) # skip headers
for row in reader:
# this will at once build the key tuple
# and remove the "x" and "y" keys from the
# row dict
key = row.pop("x"), row.pop("y")
data[key].append(row)
return data
with open('data.csv') as source:
data = read_data(source)
print data
作为旁注:不要使用dict
或file
作为var名称,特别是在顶层,因为它会遮蔽内置dict
和{{1类型。