import pandas as pd
census_df = pd.read_csv('census.csv')
#census_df.head()
def answer_seven():
census_df_1 = census_df[(census_df['SUMLEV'] == 50)].set_index('CTYNAME')
census_df_1['highest'] = census_df_1[['POPESTIAMTE2010','POPESTIAMTE2011','POPESTIAMTE2012','POPESTIAMTE2013','POPESTIAMTE2014','POPESTIAMTE2015']].max()
census_df_1['lowest'] =census_df_1[['POPESTIAMTE2010','POPESTIAMTE2011','POPESTIAMTE2012','POPESTIAMTE2013','POPESTIAMTE2014','POPESTIAMTE2015']].min()
x = abs(census_df_1['highest'] - census_df_1['lowest']).tolist()
return x[0]
answer_seven()
这是试图使用census.csv
的数据来找到2010 - 2015年期间人口绝对变化最大的县(流行数据),我想简单地找出最大值的绝对值和每年/每列的最小值。你必须返回一个字符串。另外[(census_df['SUMLEV'] ==50)]
表示只有县被设置为50.但是代码会给出一个以
KeyError:“['POPESTIAMTE2010''POPESTIAMTE2011''POPESTIAMTE2012' 'POPESTIAMTE2013'\ n'POPESTIAMTE2014''POPESTIAMTE2015']不在索引“
我是否将错误的数据结构编入索引?我是数据科学和编码的新手。
答案 0 :(得分:2)
我认为代码中的列名有拼写错误。模式是'POPESTIMATE201?'而不是'POPESTIAMTE201?'
任何有关缩短代码的帮助都将受到赞赏。这是有效的代码 -
census_df = pd.read_csv('census.csv')
def answer_seven():
cdf = census_df[(census_df['SUMLEV'] == 50)].set_index('CTYNAME')
columns = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015']
cdf['big'] = cdf[columns].max(axis =1)
cdf['sml'] = cdf[columns].min(axis =1)
cdf['change'] = cdf[['big']].sub(cdf['sml'], axis=0)
return cdf['change'].idxmax()