我有一个名为growth的数据框,其中有4列。
State Name Average Fare ($)_x Average Fare ($)_y Average Fare ($)
0 AK 599.372368 577.790640 585.944324
1 AL 548.825867 545.144447 555.939466
2 AR 496.033146 511.867026 513.761296
3 AZ 324.641818 396.895324 389.545267
4 CA 368.937971 376.723839 366.918761
5 CO 502.611572 537.206439 531.191893
6 CT 394.105453 388.772428 370.904182
7 DC 390.872738 382.326510 392.394165
8 FL 324.941100 329.728524 337.249248
9 GA 485.335737 480.606365 489.574241
10 HI 326.084793 335.547369 298.709998
11 IA 428.151682 445.625840 462.614195
12 ID 482.092567 475.822275 491.714945
13 IL 329.449503 349.938794 346.022226
14 IN 391.627917 418.945137 412.242053
15 KS 452.312058 490.024059 420.182836
最后三列是每个州每年的平均票价。 第二,第三,第四列分别是2017、2018、2019年。 我想找出自2017年以来哪个州的票价增长最快。
我尝试了我的这段代码,它给出了一些我无法真正理解的输出。 我只需要找到自2017年以来票价增长最高的州。
我的代码:
growth[['Average Fare ($)_x','Average Fare ($)_y','Average Fare ($)']].pct_change()
答案 0 :(得分:3)
growth[['Average Fare ($)_x','Average Fare ($)_y','Average Fare ($)']].pct_change(axis='columns')
这应该为您提供每年之间的百分比变化。
growth['variation_percentage'] = growth[['Average Fare ($)_x','Average Fare ($)_y','Average Fare ($)']].pct_change(axis='columns').sum(axis=1)
这应该给您累积百分比变化。
答案 1 :(得分:3)
你可以吗
df.set_index('State_name').pct_change(periods = 1, axis='columns').idxmax()
如果要计算第一年与第三年之间的差,请将periods
的值更改为2。
输出
Average_fare_x NaN
Average_fare_y AZ #state with max change between 1st & 2nd year
Average_fare WV #state with max change between 2nd & 3rd year
答案 2 :(得分:2)
由于您正在谈论差异价格,所以票价的总体增/减将是从2017年到最后一个可用数据(2019年)的差异。因此,您可以计算该比率,然后只需获取max()
即可找到增长最快的行。
growth['variation_fare'] = growth['Average Fare ($)'] / growth['Average Fare ($)_x']
growth = growth.sort_values(['variation_fare'],ascending=False)
print(growth.head(1))
示例:
import pandas as pd
a = {'State':['AK','AL','AR','AZ','CA'],'2017':[100,200,300,400,500],'2018':[120,242,324,457,592],'2019':[220,393,484,593,582]}
growth = pd.DataFrame(a)
growth['2018-2017 variation'] = (growth['2018'] / growth['2017']) - 1
growth['2019-2018 variation'] = (growth['2019'] / growth['2018']) - 1
growth['total variation'] = (growth['2019'] / growth['2017']) - 1
growth = growth.sort_values(['total variation'],ascending=False)
print(growth.head(5)) #Showing top 5
输出:
State 2017 2018 2019 2018-2017 variation 2019-2018 variation total variation
0 AK 100 120 220 0.2000 0.833333 1.200000
1 AL 200 242 393 0.2100 0.623967 0.965000
2 AR 300 324 484 0.0800 0.493827 0.613333
3 AZ 400 457 593 0.1425 0.297593 0.482500
4 CA 500 592 582 0.1840 -0.016892 0.164000