Question

我有一个以下格式的文件：

name | age | gender
abc  |  4  |  M
xyz  |  5  |  F
pqr  |  6  |  M
stu  |  5  |  F

这是一个CSV文件，因此名称，年龄和性别是不同的行。

我正在尝试将年龄值存储在列表中，并计算相似年龄的出现次数。

类似的东西：

age_list = [4,5,6,5]

每个元素的出现。我想我知道怎么做事件部分，我不能做的是将年龄值存储在列表中。

我只发布了一小段文件，以便明确理解。该文件实际上有大量数据。

我只是在读取模式下打开文件并执行以下操作：

data = [line.strip() for line in file.readlines()]

我试图搜索类似的查询但找不到它。我是这个网站的新手，所以我真的不知道规则或指南。

Answer 1

如果您有CSV格式文件，并且想要使用csv库：

import csv
from collections import Counter

with open('csvfile.csv', 'r') as csvfile:
    data = csv.reader(csvfile, delimiter=',')
    next(data, None) # Ignore headers
    results = Counter([x[1] for x in data])
    print results

如果您不想导入csv并且您在字符串变量中有数据，这可能有所帮助：

from collections import Counter

data = """name | age | gender
abc  |  4  |  M
xyz  |  5  |  F
pqr  |  6  |  M
stu  |  5  |  F"""

cleaned_data = Counter([x.split('|')[1].strip() for x in data.split('\n')[1:]])
print cleaned_data

两个示例的输出相同：

{
    '5': 2,
    '4': 1,
    '6': 1
}

Answer 2

你可以使用csv阅读器或其他libs如panda和numpy但是如果你想只使用python，这就是方法！无需额外输入任何内容

In [24]: ages = []
In [25]: with open("data.csv","r") as f:
   ....:     ages+=f.read().splitlines()
   ....:
In [26]: ages
Out[26]: ['name,age,gender', 'abc,4,M', 'xyz,5,F', 'pqr,6,M', 'stu,5,F']
In [27]: ages=[s.split(",")[1] for s in ages][1:] #all second cols(ages),except the first row
In [28]: ages
Out[28]: ['4', '5', '6', '5']

通过读取文件

2 个答案: