我在csv.DictReader()中有一个用csv模块读取的csv文件。 我有这样的输出:
{'biweek': '1', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '526822.1365'}
{'biweek': '2', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '526995.246'}
{'biweek': '3', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '527170.1981'}
{'biweek': '4', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '527347.0136'}
我需要获取“ loc”作为新字典的键,并将“ loc”的计数作为该新字典的值,因为“ loc”在文件中有很多重复。
with open('Dalziel2015_data.csv') as fh:
new_dct = {}
cities = set()
cnt = 0
reader = csv.DictReader(fh)
for row in reader:
data = dict(row)
cities.add(data.get('loc'))
for (k, v) in data.items():
if data['loc'] in cities:
cnt += 1
new_dct[data['loc']] = cnt + 1
print(new_dct)
example_file:
biweek,year,loc,cases,pop
1,1906,BALTIMORE,NA,526822.1365
2,1906,BALTIMORE,NA,526995.246
3,1906,BALTIMORE,NA,527170.1981
4,1906,BALTIMORE,NA,527347.0136
5,1906,BALTIMORE,NA,527525.7134
6,1906,BALTIMORE,NA,527706.3183
4,1906,BOSTON,NA,630880.6579
5,1906,BOSTON,NA,631295.9457
6,1906,BOSTON,NA,631710.8403
7,1906,BOSTON,NA,632125.3403
8,1906,BOSTON,NA,632539.4442
9,1906,BOSTON,NA,632953.1503
10,1907,BRIDGEPORT,NA,91790.75578
11,1907,BRIDGEPORT,NA,91926.14732
12,1907,BRIDGEPORT,NA,92061.90153
13,1907,BRIDGEPORT,NA,92198.01976
14,1907,BRIDGEPORT,NA,92334.50335
15,1907,BRIDGEPORT,NA,92471.35364
17,1908,BUFFALO,NA,413661.413
18,1908,BUFFALO,NA,413934.7646
19,1908,BUFFALO,NA,414208.4097
20,1908,BUFFALO,NA,414482.3523
21,1908,BUFFALO,NA,414756.5963
22,1908,BUFFALO,NA,415031.1456
23,1908,BUFFALO,NA,415306.0041
24,1908,BUFFALO,NA,415581.1758
25,1908,BUFFALO,NA,415856.6646
6,1935,CLEVELAND,615,890247.9867
7,1935,CLEVELAND,954,890107.9192
8,1935,CLEVELAND,965,889967.7823
9,1935,CLEVELAND,872,889827.5956
10,1935,CLEVELAND,814,889687.3781
11,1935,CLEVELAND,717,889547.1492
12,1935,CLEVELAND,770,889406.9283
13,1935,CLEVELAND,558,889266.7346
我已经做到了。我的钥匙没问题,但计数不正确。 我的结果:
{'BALTIMORE': 29, 'BOSTON': 59, 'BRIDGEPORT': 89, 'BUFFALO': 134, 'CLEVELAND': 174}
我知道pandas是一个非常好的工具,但是我需要带csv模块的代码。
如果有谁能帮助我完成这项工作,我将不胜感激。
谢谢!
保罗
答案 0 :(得分:1)
您正在更新全局计数器,而不是特定位置的计数器。您还正在迭代每一行的每一列并无故更新它。
尝试一下:
with open('Dalziel2015_data.csv') as fh:
new_dct = {}
cities = set()
reader = csv.DictReader(fh)
for row in reader:
data = dict(row)
new_dct[data['loc']] = new_dct.get(data['loc'], 0) + 1
print(new_dct)
此行:new_dct[data['loc']] = new_dct.get(data['loc'], 0) + 1
将获得该城市的最后一个计数器,并将数字加1。如果计数器不存在,则函数get
将返回0。
答案 1 :(得分:1)
您可以使用collections.Counter
来计算CSV文件中城市的出现次数。 Counter.keys()
还将为您提供CSV中找到的所有城市:
import csv
from collections import Counter
with open('csvtest.csv') as fh:
reader = csv.DictReader(fh)
c = Counter(row['loc'] for row in reader)
print(dict(c))
print('Cities={}'.format([*c.keys()]))
打印:
{'BALTIMORE': 6, 'BOSTON': 6, 'BRIDGEPORT': 6, 'BUFFALO': 9, 'CLEVELAND': 8}
Cities=['BALTIMORE', 'BOSTON', 'BRIDGEPORT', 'BUFFALO', 'CLEVELAND']