Question

我现在已经绞尽脑汁待了几个小时。我试图将1-30的攻击号码替换为相应的攻击类型，即窃取，贪污，盗窃等，然后将其排序到列表中。

以下是我目前输出的示例：

进攻＃：受害者总数

这是我到目前为止的代码。使用crime_map字典将输出中的1-30替换为攻击类型。然后按照从最大受害者计数（右列）到最小的降序对列表进行排序。我正在处理大约100,000行数据，因此效率对于该程序非常重要。

from collections import Counter

incidents_f =  open('incidents.csv', mode = "r")

crime_dict = dict()

for line in incidents_f:
    line_1st = line.strip().split(",")
    if line_1st[0].upper() != "REPORT_NO":
        report_no = line_1st[0]
        offense = line_1st[3]
        zip_code = line_1st[4]
        if len(zip_code) < 5:
            zip_code = "99999"

        if report_no in crime_dict:
            crime_dict[report_no].append(zip_code).append(offense)
        else:
            crime_dict[report_no] = [zip_code]+[offense]

#close File
incidents_f.close

details_f = open('details.csv',mode = 'r')
for line in details_f:
    line_1st = line.strip().split(",")
    if line_1st[0].upper() != "REPORT_NO":
        report_no = line_1st[0]
        involvement = line_1st[1]
        if involvement.upper() != 'VIC':
            continue

        else:
            crime_dict[report_no].append(involvement.upper())



#close File
details_f.close


offense_map = {'1':'Homicide','2':'Rape','3':'Robbery','4':'Assault','5':'Burglary','6':'Stealing','7':'Auto Theft','8':'Non Agg Assault','9':'Arson','10':'Forgery','11':'Fraud','12':'Embezzlement','13':'Stolen Property','14':'Property Damage','15':'Weapons Law Violation','16':'Prostitution','17':'Sex Offense Other','18':'Possession/Sale/Dist','20':'Family Offense','21':'DUI','22':'Liquor Law Violation','24':'Disorderly','25':'Loitering','26':'Misc Violation','29':'Missing/Runaway','30':'Casualty/Suicide'}

victims_by_offense = {}
for k, v in crime_dict.items():
    zip = v[1]
    if zip not in victims_by_offense.keys():
        victims_by_offense[zip] = 0
    victims_by_offense[zip] += v[0:].count('VIC')

for zip in sorted(victims_by_offense.keys()):
    print(zip, victims_by_offense[zip])

Answer 1

按受害者总数的降序获取victims_by_offense中的键列表：

victims_by_offense = {'1': 189, '10': 712, '11': 1844, '12': 184, '13': 147, '14': 4364, '15': 595, '16': 175, '17': 387, '18': 2893, '2': 597, '20': 661}
sorted_keys = sorted(victims_by_offense, key=victims_by_offense.get, reverse=True)

然后

for zip in sorted_keys:
    print(offense_map[zip], victims_by_offense[zip])

我得到了

('Property Damage', 4364)
('Possession/Sale/Dist', 2893)
('Fraud', 1844)
('Forgery', 712)
('Family Offense', 661)
('Rape', 597)
('Weapons Law Violation', 595)
('Sex Offense Other', 387)
('Homicide', 189)
('Embezzlement', 184)
('Prostitution', 175)
('Stolen Property', 147)
('Homicide', 189)
('Embezzlement', 184)
('Prostitution', 175)
('Stolen Property', 147)

Answer 2

我稍微调整了你的代码以使用csv.reader个对象，而不是自己剥离和拆分，以及将数据结构更改为

crimes = {report_no: {'offense': offense_number,
                      'zip': zip_code,
                      'victims': victim_count},
          ...}

但我认为这样做效果会好得多。

import csv
import itemgetter

crimes = dict()

# build `crimes` dict with zero-count victims
with open("incidents.csv") as f:
    reader = csv.reader(f)
    headers = next(reader)
    for report_no, _, _, offense, zip_code, *_ in reader:
        if len(zip_code) < 5:
            zip_code = "99999"
        report = (zip_code, offense)
        crimes[report_no] = {'offense': offense,
                             'zip': zip_code,
                             'victims': 0}

# parse victims information
with open("details.csv") as f:
    reader = csv.reader(f)
    headers = next(reader)
    for report_no, involvement, *_ in reader:
        if involvement.upper() == "VIC":
            crimes[report_no]['victims'] += 1

offense_map = {'1':'Homicide',
               '2':'Rape',
               '3':'Robbery',
               '4':'Assault',
               '5':'Burglary',
               '6':'Stealing',
               '7':'Auto Theft',
               '8':'Non Agg Assault',
               '9':'Arson',
               '10':'Forgery',
               '11':'Fraud',
               '12':'Embezzlement',
               '13':'Stolen Property',
               '14':'Property Damage',
               '15':'Weapons Law Violation',
               '16':'Prostitution',
               '17':'Sex Offense Other',
               '18':'Possession/Sale/Dist',
               '20':'Family Offense',
               '21':'DUI',
               '22':'Liquor Law Violation',
               '24':'Disorderly',
               '25':'Loitering',
               '26':'Misc Violation',
               '29':'Missing/Runaway',
               '30':'Casualty/Suicide'}

counts = {k: 0 for k in offense_map.values()}
# start counting crimes by victim count (by name, not number)

for crime_info in crimes.values()
    try:
        offense_no = crime_info['offense']
        offense_name = offense_map[offense_no]
        counts[offense_name] += crime_info['victims']
    except KeyError:
        # we couldn't map that
        print("No such offense: {}".format(crime_info['offense']))

# sort by value
for k,v in sorted(counts.items(), key=operator.itemgetter(1), reverse=True):
    print(k, v)

Python：将值替换为字典键值对中的值

2 个答案: