Python列表理解返回错误的标签

时间:2018-06-16 12:58:45

标签: python list pandas dataframe

以下是我的数据集示例:

Donor State       Donation    Year
Illinois          50          2013
other             99          2016
California        26          2017
Indiana           50          2017
New Jersey        50          2017
Connecticut       25          2016
Wisconsin         10          2017
New York          100         2017
Indiana           74.32       2015
Indiana           25          2015
Pennsylvania      10          2014
Pennsylvania      10          2013
New Jersey        5           2018
New Jersey        5           2018

我想要做的是查看我在不同年份的一系列特定国家的捐款,但由于某种原因,索引名称没有返回我需要的东西正在努力确定什么状态是什么。

state_list = ['Illinois','other','California','Indiana','New Jersey','Connecticut','Wisconsin','New York','Pennsylvania','Delaware']

我写了一个简单的列表理解:

pd.DataFrame([df[df['Donor State'] == i].groupby(['Year'])["Donation"].sum() for i in state_list])

我得到以下内容:

Year            2013    2014    2015    2016    2017    2018
Donation Amount 211.48  672.82  265.00  477.95  1900.01 52.00
Donation Amount NaN     NaN     120.00  99.00   25.00   NaN
Donation Amount 414.50  2767.12 2158.36 391.02  1839.95 50.00
Donation Amount 20.00   NaN     274.32  85.00   367.44  100.00
Donation Amount 100.00  100.00  199.95  624.00  719.00  245.00
Donation Amount 125.00  30.00   25.00   870.56  105.00  215.00
Donation Amount NaN     NaN     NaN     100.00  191.66  75.00
Donation Amount 235.00  85.00   849.46  1110.00 446.59  87.18
Donation Amount 160.00  205.00  124.00  250.00  150.00  25.00
Donation Amount NaN     NaN     110.00  52.89   100.00  NaN

我想要的是什么:

Year            2013    2014    2015    2016    2017    2018
Illinois        211.48  672.82  265.00  477.95  1900.01 52.00
California      NaN     NaN     120.00  99.00   25.00   NaN
Pennsylvania    414.50  2767.12 2158.36 391.02  1839.95 50.00
Delaware        20.00   NaN     274.32  85.00   367.44  100.00
...............................................................

有没有办法可以获取州名称标签并将其作为索引?数据框正是我想要的,但我需要索引中的“捐赠金额”作为州名。

很抱歉,如果这是一个愚蠢的问题而我错过了一些明显的事情

1 个答案:

答案 0 :(得分:2)

pivot_table + reindex

您可以使用pivot_table转换数据框,然后使用reindex指定您感兴趣的状态。

state_list = ['Illinois','other','California','Indiana','NewJersey','Connecticut',
              'Wisconsin','NewYork','Pennsylvania','Delaware']

res = df.pivot_table(index='DonorState', columns='Year', values='Donation',
                     aggfunc=sum, fill_value=0)\
        .reindex(state_list, axis=0)

print(res)

Year          2013  2014   2015  2016   2017  2018
DonorState                                        
Illinois      50.0   0.0   0.00   0.0    0.0   0.0
other          0.0   0.0   0.00  99.0    0.0   0.0
California     0.0   0.0   0.00   0.0   26.0   0.0
Indiana        0.0   0.0  99.32   0.0   50.0   0.0
NewJersey      0.0   0.0   0.00   0.0   50.0  10.0
Connecticut    0.0   0.0   0.00  25.0    0.0   0.0
Wisconsin      0.0   0.0   0.00   0.0   10.0   0.0
NewYork        0.0   0.0   0.00   0.0  100.0   0.0
Pennsylvania  10.0  10.0   0.00   0.0    0.0   0.0
Delaware       NaN   NaN    NaN   NaN    NaN   NaN