我想创建一个从csv返回每个属性值的计数的函数,输出应该是一个字典(对于每个属性一个),其中键是不同的属性值,关联的值是次数该值出现在数据中......
例如我有以下CSV文件(第一行是标题):
First_Name,Last_Name,Age
Johnny,Got,22
Michael,Jackson,22
Johnny,Jackson,50
Andrea,Got,12
我希望将其作为输出,
for first name: {'Johnny': 2, 'Michael': 1, 'Andrea': 1}
for the second name: {'Jackson': 2, 'Got': 2}
and for the age: {22: 2, 50: 1, 12: 1}
我认为当我使用CSV的Counter
类型时,我可以使用python collections
模块中的DictReader
类来执行此操作,以便每行也是字典。但是我还是不能把它带到工作中,有没有人知道这是否可能?这是我到现在为止所尝试的。 :)
import csv
import os
import collections
FIRSTNAME_ATT = 'First_Name'
LASTNAME_ATT = 'Last_Name'
AGE_ATT = 'Age'
def count_attributes(file_name):
firstname_counts = {}
lastname_counts = {}
age_counts = {}
with open(file_name, encoding='utf-8') as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
for i, val in enumerate(row):
count_number[i][val] += 1
# Here I don't get any further :(
return firstname_counts, lastname_counts, age_counts
if __name__ == '__main__':
data_file = os.path.join("..", "data", "thecsvfile.csv")
firstname_counts, lastname_counts, age_counts = attribute_counts(data_file)
print(firstname_counts)
print(lastname_counts)
print(age_counts)
如果有人有提示或想法如何解决这个问题会很棒。 :)
答案 0 :(得分:1)
<强>解决方案强>:
firstname_counts = {}
lastname_counts = {}
age_counts = {}
with open(file_name, encoding='utf-8') as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
firstname_counts[row['First_Name']] = firstname_counts.get(row['First_Name'], 0) + 1
lastname_counts[row['Last_Name']] = lastname_counts.get(row['Last_Name'], 0) + 1
# similar for age...
您只需要检查词典中的键是否存在,如果存在,则添加值1或者当它不存在时获取0并添加1. .get
方法
在字典中解决了它。
编辑:
解决方案2(使用collections.Counter
):
from collections import Counter
firstname_counts = Counter()
lastname_counts = Counter()
age_counts = Counter()
# same code as in the above solution.
答案 1 :(得分:0)
除
之外,您可以使用
id level agroup
a1 1 A
a2 1 A
a3 1 A
a4 1 A
b1 2 A
b2 2 A
b3 2 A
c1 3 A
c2 3 A
c2 2 A
c3 3 A
d1 1 A
d1 2 A
e1 4 B
e1 5 B
f3 5 B
f4 5 B
来保持简单,并且在csv文件本身的内容将决定什么的意义上使处理主要是“日期驱动的”属性是(而不是硬编码他们的名字)。
使用collections.Counter
会保留csv文件标题行中属性的顺序。
这就是我所说的:
collections.OrderedDict
输出:
OrderedDict