我正在尝试编写一个for循环,它将采用包含人口普查数据的数据框,计算每个州的三个最大县的人口,并将总和写入新系列。这是不起作用的功能:
import numpy as np
import pandas as pd
##created a dataframe earlier with a census csv file called 'census_df'
def bad_function():
only_counties = census_df.set_index(['STNAME'])
ser = pd.Series(index = only_counties.index)
ser = ser.index.drop_duplicates() ##get a unique list of all 50 states from the dataframe
state_name = pd.Series(index = ser)
for i in state_name.index:
a = only_counties.loc[i, 'CENSUS2010POP']
a = a.sort_values(ascending=False)
population = np.sum(a[0:3])
state_name.loc[i] = population
return state_name
当我调用此函数时,出现以下错误:
AttributeError Traceback (most recent call last)
<ipython-input-59-dc2686648261> in <module>()
26 return state_name
27
---> 28 answer_six()
<ipython-input-59-dc2686648261> in answer_six()
18 for i in state_name.index:
19 a = only_counties.loc[i, 'CENSUS2010POP']
---> 20 a = a.sort_values(ascending=False)
21
22 population = np.sum(a[0:3])
AttributeError: 'numpy.int64' object has no attribute 'sort_values'
但是,当我为了测试目的而抛弃循环并从我想要迭代的索引中选择了一个项目(&#39; Alabama&#39;),并以相同的方式使用相同的sort_values方法,它工作得很好。像这样:
def bad_function():
only_counties = census_df.set_index(['STNAME'])
ser = pd.Series(index = only_counties.index)
ser = ser.index.drop_duplicates()
state_name = pd.Series(index = ser)
a = only_counties.loc['Alabama', 'CENSUS2010POP']
a = a.sort_values(ascending=False)
b = np.sum(a[0:3])
return a, b
它准确地返回我想要的东西,它是:按州填写的州的县列表和b:三个最高人口县的总和。那么发生了什么?
答案 0 :(得分:0)
您是以下人员吗?
for i in state_name.index:
print (I)
打印状态名称或索引?