从CSV计算属性值?

时间:2017-11-09 18:39:21

标签: python csv dictionary count

我想创建一个从csv返回每个属性值的计数的函数,输出应该是一个字典(对于每个属性一个),其中键是不同的属性值,关联的值是次数该值出现在数据中......

例如我有以下CSV文件(第一行是标题):

First_Name,Last_Name,Age
Johnny,Got,22
Michael,Jackson,22
Johnny,Jackson,50
Andrea,Got,12

我希望将其作为输出,

for first name: {'Johnny': 2, 'Michael': 1, 'Andrea': 1}
for the second name: {'Jackson': 2, 'Got': 2}
and for the age: {22: 2, 50: 1, 12: 1}

我认为当我使用CSV的Counter类型时,我可以使用python collections模块中的DictReader类来执行此操作,以便每行也是字典。但是我还是不能把它带到工作中,有没有人知道这是否可能?这是我到现在为止所尝试的。 :)

import csv
import os
import collections

FIRSTNAME_ATT = 'First_Name'
LASTNAME_ATT = 'Last_Name'
AGE_ATT = 'Age'


def count_attributes(file_name):
    firstname_counts = {}
    lastname_counts = {}
    age_counts = {}

    with open(file_name, encoding='utf-8') as csv_file:
        reader = csv.DictReader(csv_file)
        for row in reader:
            for i, val in enumerate(row):
                count_number[i][val] += 1
# Here I don't get any further :(
    return firstname_counts, lastname_counts, age_counts


if __name__ == '__main__':
    data_file = os.path.join("..", "data", "thecsvfile.csv")
    firstname_counts, lastname_counts, age_counts = attribute_counts(data_file)
    print(firstname_counts)
    print(lastname_counts)
    print(age_counts)

如果有人有提示或想法如何解决这个问题会很棒。 :)

2 个答案:

答案 0 :(得分:1)

<强>解决方案

firstname_counts = {}
lastname_counts = {}
age_counts = {}

with open(file_name, encoding='utf-8') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        firstname_counts[row['First_Name']] = firstname_counts.get(row['First_Name'], 0) + 1
        lastname_counts[row['Last_Name']] = lastname_counts.get(row['Last_Name'], 0) + 1
        # similar for age...

您只需要检查词典中的键是否存在,如果存在,则添加值1或者当它不存在时获取0并添加1. .get方法  在字典中解决了它。

参考:dict .get method

编辑:

解决方案2(使用collections.Counter

from collections import Counter

firstname_counts = Counter()
lastname_counts = Counter()
age_counts = Counter()

# same code as in the above solution.

答案 1 :(得分:0)

之外,您可以使用 id level agroup a1 1 A a2 1 A a3 1 A a4 1 A b1 2 A b2 2 A b3 2 A c1 3 A c2 3 A c2 2 A c3 3 A d1 1 A d1 2 A e1 4 B e1 5 B f3 5 B f4 5 B 来保持简单,并且在csv文件本身的内容将决定什么的意义上使处理主要是“日期驱动的”属性是(而不是硬编码他们的名字)。

使用collections.Counter会保留csv文件标题行中属性的顺序。

这就是我所说的:

collections.OrderedDict

输出:

OrderedDict