我有一个简单的3列csv文件,我需要使用python根据一个键对每一行进行分组,然后平均另一个键的值并返回它们。文件是标准的csv格式,设置如下;
ID, ZIPCODE, RATE
1, 19003, 27.50
2, 19003, 31.33
3, 19083, 41.4
4, 19083, 17.9
5, 19102, 21.40
所以基本上我需要做的是计算该文件中每个唯一zipcode col [1]的平均速率col [2]并返回结果。因此,获得19003,19083等所有记录的平均费率。
我已经看过使用csv模块并将文件读入字典,然后根据zipcode col中的唯一值对dict进行排序,但似乎无法取得任何进展。
任何帮助/建议表示赞赏。
答案 0 :(得分:7)
我已经记录了一些有助于澄清事情的步骤:
import csv
from collections import defaultdict
# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
# skip the header line and any empty rows
# we take advantage of the first row being indexed at 0
# i=0 which evaluates as false, as does an empty row
if not i or not row:
continue
# unpack the columns into local variables
_, zipcode, level = row
# for each zipcode, add the level the list
data[zipcode].append(float(level))
# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
print zipcode, sum(levels) / float(len(levels))
输出:
19102 21.4
19003 29.415
19083 29.65
答案 1 :(得分:3)
通常如果我必须进行复杂的详细说明,我使用csv来加载关系数据库表中的行(sqlite是最快的方法)然后我使用标准的sql方法来提取数据并计算平均值:
import csv
from StringIO import StringIO
import sqlite3
data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""
f = StringIO(data)
reader = csv.reader(f)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()
for e in reader:
e[2] = float(e[2])
c.execute("""insert into data
values (?,?,?)""", e)
conn.commit()
c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
print row