我遇到的问题是我无法找出为什么我的代码也无法以我想要的方式输出。这可能与我对字典的理解或代码中的逻辑有关。有人可以协助我获取这些嵌套词典吗? 链接到CSV:https://docs.google.com/document/d/1v68_QQX7Tn96l-b0LMO9YZ4ZAn_KWDMUJboa6LEyPr8/edit?usp=sharing
import csv
data_by_region = {}
data_by_country = {}
answers = []
data = []
countries = False
f = open('dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv')
reader = csv.DictReader(f)
for line in reader:
#This gets all the values into a standard dict
data.append(dict(line))
#This will loop thru the dict and create variables to hold specific items
for i in data:
# collects all of the Region/Country/Area
places = i['Region/Country/Area']
# Gets All the Years
years = i['Year']
i_d = i['ID']
info = i['Footnotes']
series = i['Series']
value = float(i['Value'])
# print(series)
stats = {i['Series']:i['Value']}
# print(stats)
if (i['ID']== '4'):
countries = True
if countries == True:
if places not in data_by_country:
data_by_country[places] = {}
if years not in data_by_country:
data_by_country[places][years] = {}
data_by_country[places][years].update(stats)
# if series not in data_by_country:
# data_by_country[places][years][series] = {}
# if value not in data_by_country:
# data_by_country[places][years][series] = value
else:
if places not in data_by_region:
data_by_region[places] = {}
if years not in data_by_region:
data_by_region[places][years] = {}
data_by_region[places][years] = stats
# if series not in data_by_region:
# data_by_region[places][series] = series
# # if value not in data_by_region:
# data_by_region[places][years][series] = value
print(data_by_region['Western Africa'])
"Western Africa" : {
2005: {
"Population annual rate of increase (percent)": 2.6,
"Total fertility rate (children per women)": 6,
"Infant mortality for both sexes (per 1,000 live births)": 95.7,
"Life expectancy at birth for both sexes (years)": 49.3,
"Life expectancy at birth for males (years)": 48.4,
"Life expectancy at birth for females (years)": 50.2
},
2010: {
<data>
},
2015: {
<data>
}
答案 0 :(得分:1)
我强烈建议您使用pandas软件包。使用此程序包可能会达到您的目标,该程序包专门用于管理您拥有的信息种类,并具有许多分析和可视化功能。
例如,您可以通过以下方式读取文件:
import pandas as pd
filename = 'dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv'
df = pd.read_csv(filename)
在您的情况下,您还需要添加“,”作为千位分隔符:
df = pd.read_csv(filename, thousands=r',')
这为您提供了一种对象(数据框),其信息按列进行组织,您可以通过多种方式将其作为字典进行管理或转换,或者直接用于实现目标。
您可以获得ID的所有数据:
df[df['ID'] == 4]
或按特定区域。
wa = df[df['Region/Country/Area'] == 'Western Africa']
或者您可以遍历所有唯一值:
unique_regions = df['Region/Country/Area'].unique()
有了该子数据框,您可以通过以下方式构建pivot table:
wa1 = pd.pivot_table(wa, index='Year', columns='Series', values='Value')
然后,您可以在字典中转换新的数据框:
values = wa1.to_dict('records')
并使用
获取索引列表indexes = wa1.index
这两个列表可用于为每个区域构建字典:
d = {key: value for (key, value) in zip(indexes, values)}
{2005: {'Infant mortality for both sexes (per 1,000 live births)': 95.700000000000003,
'Life expectancy at birth for both sexes (years)': 49.299999999999997,
'Life expectancy at birth for females (years)': 50.200000000000003,
'Life expectancy at birth for males (years)': 48.399999999999999,
'Population annual rate of increase (percent)': 2.6000000000000001,
'Total fertility rate (children per women)': 6.0},
2010: {'Infant mortality for both sexes (per 1,000 live births)': 82.700000000000003,
'Life expectancy at birth for both sexes (years)': 52.299999999999997,
'Life expectancy at birth for females (years)': 53.200000000000003,
'Life expectancy at birth for males (years)': 51.5,
'Population annual rate of increase (percent)': 2.7000000000000002,
'Total fertility rate (children per women)': 5.7999999999999998},
2015: {'Infant mortality for both sexes (per 1,000 live births)': 70.5,
'Life expectancy at birth for both sexes (years)': 54.700000000000003,
'Life expectancy at birth for females (years)': 55.600000000000001,
'Life expectancy at birth for males (years)': 53.899999999999999,
'Population annual rate of increase (percent)': 2.7000000000000002,
'Total fertility rate (children per women)': 5.5}}
最后,您可以使用另一个循环来为每个区域构建一个列表或带有项的字典。
作为总结,您可以使用pandas将代码简化为:
import pandas as pd
filename = 'dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv'
df_total = pd.read_csv(filename, thousands=r',')
regions = df_total['Region/Country/Area'].unique()
out = {}
for reg in regions:
df_region = df_total[df_total['Region/Country/Area'] == reg]
pivot = df_region.pivot_table(index='Year', columns='Series', values='Value')
values_by_year = pivot.to_dict('records')
data_reg = {key: value for (key, value) in zip(pivot.index, values_by_year)}
out[reg] = data_reg
out
此代码没有要查找的嵌套字典。
{'Afghanistan': {2005: {'Infant mortality for both sexes (per 1,000 live births)': 89.5,
'Life expectancy at birth for both sexes (years)': 56.899999999999999,
'Life expectancy at birth for females (years)': 58.100000000000001,
'Life expectancy at birth for males (years)': 55.799999999999997,
'Maternal mortality ratio (deaths per 100,000 population)': 821.0,
'Population annual rate of increase (percent)': 4.4000000000000004,
'Total fertility rate (children per women)': 7.2000000000000002},
2010: {'Infant mortality for both sexes (per 1,000 live births)': 76.700000000000003,
'Life expectancy at birth for both sexes (years)': 60.0,
'Life expectancy at birth for females (years)': 61.299999999999997,
'Life expectancy at birth for males (years)': 58.899999999999999,
'Maternal mortality ratio (deaths per 100,000 population)': 584.0,
'Population annual rate of increase (percent)': 2.7999999999999998,
'Total fertility rate (children per women)': 6.4000000000000004},
2015: {'Infant mortality for both sexes (per 1,000 live births)': 68.599999999999994,
'Life expectancy at birth for both sexes (years)': 62.299999999999997,
'Life expectancy at birth for females (years)': 63.5,
'Life expectancy at birth for males (years)': 61.100000000000001,
'Maternal mortality ratio (deaths per 100,000 population)': 396.0,
'Population annual rate of increase (percent)': 3.2000000000000002,
'Total fertility rate (children per women)': 5.2999999999999998}},
'Africa': <DATA>,
.
.
.
'Zimbabwe': <DATA>}