Question

我正在尝试编写一个for循环，它将采用包含人口普查数据的数据框，计算每个州的三个最大县的人口，并将总和写入新系列。这是不起作用的功能：

import numpy as np
import pandas as pd

##created a dataframe earlier with a census csv file called 'census_df'


def bad_function():
    only_counties = census_df.set_index(['STNAME'])

    ser = pd.Series(index = only_counties.index)
    ser = ser.index.drop_duplicates() ##get a unique list of all 50 states from the dataframe

    state_name = pd.Series(index = ser)


    for i in state_name.index:
        a = only_counties.loc[i, 'CENSUS2010POP']
        a = a.sort_values(ascending=False)

        population = np.sum(a[0:3])

        state_name.loc[i] = population

    return state_name

当我调用此函数时，出现以下错误：

AttributeError                            Traceback (most recent call last)
<ipython-input-59-dc2686648261> in <module>()
     26     return state_name
     27 
---> 28 answer_six()

<ipython-input-59-dc2686648261> in answer_six()
     18     for i in state_name.index:
     19         a = only_counties.loc[i, 'CENSUS2010POP']
---> 20         a = a.sort_values(ascending=False)
     21 
     22         population = np.sum(a[0:3])

AttributeError: 'numpy.int64' object has no attribute 'sort_values'

但是，当我为了测试目的而抛弃循环并从我想要迭代的索引中选择了一个项目（＆＃39; Alabama＆＃39;），并以相同的方式使用相同的sort_values方法，它工作得很好。像这样：

def bad_function():
    only_counties = census_df.set_index(['STNAME'])

    ser = pd.Series(index = only_counties.index)
    ser = ser.index.drop_duplicates()

    state_name = pd.Series(index = ser)

    a = only_counties.loc['Alabama', 'CENSUS2010POP']
    a = a.sort_values(ascending=False)

    b = np.sum(a[0:3])

    return a, b

它准确地返回我想要的东西，它是：按州填写的州的县列表和b：三个最高人口县的总和。那么发生了什么？

Answer 1

您是以下人员吗？

for i in state_name.index:
    print (I)

打印状态名称或索引？

pandas sort_values在for循环外工作，但不在里面？

1 个答案: