使用python熊猫找到最高的增长?

时间:2019-10-08 12:50:57

标签: python pandas numpy

我有一个名为growth的数据框,其中有4列。

State Name  Average Fare ($)_x  Average Fare ($)_y  Average Fare ($)
0   AK  599.372368  577.790640  585.944324
1   AL  548.825867  545.144447  555.939466
2   AR  496.033146  511.867026  513.761296
3   AZ  324.641818  396.895324  389.545267
4   CA  368.937971  376.723839  366.918761
5   CO  502.611572  537.206439  531.191893
6   CT  394.105453  388.772428  370.904182
7   DC  390.872738  382.326510  392.394165
8   FL  324.941100  329.728524  337.249248
9   GA  485.335737  480.606365  489.574241
10  HI  326.084793  335.547369  298.709998
11  IA  428.151682  445.625840  462.614195
12  ID  482.092567  475.822275  491.714945
13  IL  329.449503  349.938794  346.022226
14  IN  391.627917  418.945137  412.242053
15  KS  452.312058  490.024059  420.182836

最后三列是每个州每年的平均票价。 第二,第三,第四列分别是2017、2018、2019年。 我想找出自2017年以来哪个州的票价增长最快。

我尝试了我的这段代码,它给出了一些我无法真正理解的输出。  我只需要找到自2017年以来票价增长最高的州。

我的代码:

growth[['Average Fare ($)_x','Average Fare ($)_y','Average Fare ($)']].pct_change()

3 个答案:

答案 0 :(得分:3)

growth[['Average Fare ($)_x','Average Fare ($)_y','Average Fare ($)']].pct_change(axis='columns')

这应该为您提供每年之间的百分比变化。

growth['variation_percentage'] = growth[['Average Fare ($)_x','Average Fare ($)_y','Average Fare ($)']].pct_change(axis='columns').sum(axis=1)

这应该给您累积百分比变化。

答案 1 :(得分:3)

你可以吗

df.set_index('State_name').pct_change(periods  = 1, axis='columns').idxmax()

如果要计算第一年与第三年之间的差,请将periods的值更改为2。

输出

Average_fare_x    NaN
Average_fare_y     AZ #state with max change between 1st & 2nd year
Average_fare       WV #state with max change between 2nd & 3rd year

答案 2 :(得分:2)

由于您正在谈论差异价格,所以票价的总体增/减将是从2017年到最后一个可用数据(2019年)的差异。因此,您可以计算该比率,然后只需获取max()即可找到增长最快的行。

growth['variation_fare'] =  growth['Average Fare ($)'] / growth['Average Fare ($)_x']
growth = growth.sort_values(['variation_fare'],ascending=False)
print(growth.head(1))

示例:

import pandas as pd
a = {'State':['AK','AL','AR','AZ','CA'],'2017':[100,200,300,400,500],'2018':[120,242,324,457,592],'2019':[220,393,484,593,582]}
growth = pd.DataFrame(a)
growth['2018-2017 variation'] = (growth['2018'] / growth['2017']) - 1
growth['2019-2018 variation'] = (growth['2019'] / growth['2018']) - 1
growth['total variation'] = (growth['2019'] / growth['2017']) - 1
growth = growth.sort_values(['total variation'],ascending=False)
print(growth.head(5)) #Showing top 5

输出:

  State  2017  2018  2019  2018-2017 variation  2019-2018 variation  total variation
0    AK   100   120   220               0.2000             0.833333         1.200000
1    AL   200   242   393               0.2100             0.623967         0.965000
2    AR   300   324   484               0.0800             0.493827         0.613333
3    AZ   400   457   593               0.1425             0.297593         0.482500
4    CA   500   592   582               0.1840            -0.016892         0.164000