计算导入的.csv文件中的多次出现次数

时间:2014-09-30 20:51:28

标签: python csv if-statement count field

从大型导入数据集开始,我正在尝试识别并打印与那里至少有2所独立学院/大学的城市相对应的每一行。

到目前为止(相关代码):

for line in file:

    fields = line.split(",")
    ID, name, city = fields[0], fields[1], fields[3]
    count = line.count()

if line.count(city) >= 2:
    if line.count(ID) < 2:
    print "ID:", ID, "Name: ", name, "City: ", city

换句话说,我希望能够消除1)任何重复的学校列表(通过ID - 此文件有许多机构反复出现),2)任何没有两个或更多机构的城市。

谢谢!

1 个答案:

答案 0 :(得分:0)

当您想要通过某些键订购数据时,

dicts会派上用场。在你的情况下,首先按城市,然后按ID索引的嵌套dicts应该可以解决问题。

# will hold cities[city][ID] = [ID, name, city]
cities = {}

for line in file:
    fields = lines.split()
    ID, name, city = fields
    cities.setdefault(name, {})[ID] = fields

# 'cities' values are the IDs for that city. make a list if there are at least 2 ids
multi_schooled_cities = [ids_by_city.values() for ids_by_city in cities.values() if len(ids_by_city) >= 2]