我使用以下方法从网站提取数据:
import pandas as pd
import matplotlib.pyplot as plt
stat_dict={'Disposals' : 0,
'Kicks' : 1,
'Marks' : 2,
'Handballs' : 3,
'Goals' : 4,
'Behinds' : 5,
'Hitouts' : 6,
'Tackles' : 7,
'Rebounds' : 8,
'Inside50s' : 9,
'Clearances': 9,
'Clangers' : 10,
'FreesFor' : 11,
'FreesAgainst' : 12,
'ContestedPosessions' : 13,
'UncontestedPosesseions' : 14,
'ContestedMarks' : 15,
'MarksInside50' : 16,
'OnePercenters' : 17,
'Bounces' : 18,
'GoalAssists' : 19,
'Timeplayed' : 20}
team_lower_case='fremantle'
player1="Fyfe, Nat"
stat_required='Disposals'
rounds=8
tables = pd.read_html("https://afltables.com/afl/stats/teams/" +str(team_lower_case)+"/2018_gbg.html")
for df in tables:
df.drop(df.columns[rounds+1:], axis=1, inplace=True) # remove unwanted columns
df.columns = df.columns.droplevel(0) # remove extra index level
stat_table=tables[stat_dict[stat_required]]
player_stat=stat_table[stat_table["Player"]==player1]
产生以下内容:
Player R1 R2 R3 R4 R5 R6 R7 R8
8 Fyfe, Nat 22.0 29.0 38.0 25.0 43.0 27.0 33.0 36.0
如何在“'播放器'”
列中为数据设置索引?如何将列标题转为行标签,反之亦然? 我正在寻找的输出如下:
Round Fyfe, Nat Neale,Lachie
R1 22 37
R2 29 28
我想得到的最终输出是散点图,在x轴上绘制列标题,例如R1,R2等......以及y轴上的行数据。
我觉得我应该可以直接绘制数据框,但我取得成功的唯一方法是执行以下操作:
for round, disp in player_stat.iterrows():
player1_list=[]
player1_list.append(disp)
plt.style.use('ggplot')
plt.scatter(range(1,rounds+1), player1_list, label=player1)
plt.legend(loc="lower right")
plt.title("Disposals per round")
plt.xlabel("Rounds")
plt.ylabel("Disposals")
plt.ylim(ymin=0)
似乎我应该使用.transpose,它接近我想要的并输出以下内容:
8
Player Fyfe, Nat
R1 22
R2 29
R3 38
R4 25
R5 43
R6 27
R7 33
R8 36
答案 0 :(得分:1)
set_index
使用T
和How to simulate HTTP post request using Python Requests module?:
stat_table=tables[stat_dict[stat_required]].set_index('Player').T
print (stat_table)
Player Ballantyne, Hayden Banfield, Bailey Blakely, Connor \
R1 10.0 12.0 17.0
R2 11.0 14.0 30.0
R3 18.0 11.0 21.0
R4 9.0 17.0 16.0
R5 14.0 18.0 23.0
R6 6.0 14.0 31.0
R7 14.0 20.0 21.0
R8 11.0 12.0 35.0
Player Brayshaw, Andrew Cerra, Adam Cox, Brennan Crowden, Mitch \
R1 12.0 NaN NaN NaN
R2 16.0 9.0 NaN 11.0
R3 7.0 10.0 NaN 13.0
R4 11.0 7.0 NaN 15.0
R5 17.0 15.0 NaN 17.0
R6 14.0 11.0 NaN 11.0
R7 14.0 18.0 8.0 8.0
R8 16.0 14.0 12.0 10.0
Player Duman, Taylin Fyfe, Nat Hamling, Joel ... Pearce, Danyle \
R1 NaN 22.0 NaN ... 12.0
R2 NaN 29.0 12.0 ... NaN
R3 NaN 38.0 11.0 ... NaN
R4 NaN 25.0 16.0 ... NaN
R5 15.0 43.0 11.0 ... NaN
R6 12.0 27.0 9.0 ... NaN
R7 8.0 33.0 12.0 ... NaN
R8 15.0 36.0 18.0 ... 18.0
Player Ryan, Luke Sandilands, Aaron Sheridan, Tom Sutcliffe, Cameron \
R1 16.0 16.0 NaN 17.0
R2 22.0 10.0 NaN NaN
R3 21.0 14.0 14.0 NaN
R4 14.0 15.0 11.0 NaN
R5 21.0 9.0 NaN NaN
R6 18.0 10.0 NaN NaN
R7 13.0 11.0 NaN NaN
R8 27.0 13.0 NaN NaN
Player Taberner, Matthew Tucker, Darcy Walters, Michael Wilson, Nathan \
R1 20.0 12.0 18.0 15.0
R2 16.0 NaN 26.0 23.0
R3 18.0 NaN 23.0 18.0
R4 19.0 15.0 21.0 14.0
R5 6.0 17.0 18.0 27.0
R6 NaN 17.0 2.0 15.0
R7 NaN 15.0 NaN 19.0
R8 NaN 13.0 NaN NaN
Player Totals
R1 358.0
R2 399.0
R3 387.0
R4 362.0
R5 407.0
R6 356.0
R7 346.0
R8 404.0
[8 rows x 32 columns]
然后可以按player1
选择列:
player_stat=stat_table[player1]
print (player_stat)
R1 22.0
R2 29.0
R3 38.0
R4 25.0
R5 43.0
R6 27.0
R7 33.0
R8 36.0
Name: Fyfe, Nat, dtype: float64
最后plot
:
plt.scatter((range(1,rounds+1)), stat_table[player1])
答案 1 :(得分:0)
您可以尝试:
df = pd.DataFrame({'Player':['Fyfe, Nat','Neale,Lachie'],
'r1':[22,29],
'r2': [27, 25],
'r3': [30, 21]})
print df
Player r1 r2 r3
0 Fyfe, Nat 22 27 30
1 Neale,Lachie 29 25 21
df = df.set_index('Player').T
print df
输出:
Player Fyfe, Nat Neale,Lachie
r1 22 29
r2 27 25
r3 30 21