我正在制作一个表,并通过一个名为“ passer_player_name”的变量对其进行分组
data.loc[(data['play_type'] == 'pass') & (data['down'] <= 4)].groupby(by='passer_player_name')[['epa']].mean()
passer_index = data.loc[(data['play_type'] == 'pass') & (data['down'] <= 4)].groupby(by='passer_player_name')[['epa', 'success','yards_gained']].mean()
passer_index['attempts'] = data.loc[(data['play_type'] == 'pass') & (data['down'] <= 4)].groupby(by='passer_player_name')['epa'].count()
这给出了以下输出(一些示例行):
epa success yards_gained attempts
passer_player_name
L.Jackson 0.336 0.48 6.9 335
K.Cousins 0.295 0.50 7.1 363
P.Mahomes 0.285 0.50 7.4 368
接下来我要做的事情要求我使用'passer_player_name'列来对表进行抓取/排序,但是从技术上讲,这不是表的一部分。我尝试执行以下操作:
passer_index['team_names'] = data.loc[(data['play_type'] == 'pass') & (data['down'] <= 4)].groupby(by='passer_player_name').posteam
不幸的是,这在添加的“ team_names”列(这是一个示例行)中提供了以下内容:
(L.Jackson, [BAL, BAL, BAL, BAL, BAL, BAL, BAL...
我怎么会得到一个只说出球队名称一次的列,就像只显示输出“ BAL”的列(每个球员的球队显然不同)?
要弄乱它,因为我显然无法显示整个数据集以及数据的来源,我的问题本质上是:
我如何从显示以下内容的行中获得答案:
(L.Jackson, [BAL, BAL, BAL, BAL, BAL, BAL, BAL...
仅显示“ BAL”的行吗?如何从该系列/序列/任何内容中提取数据?
答案 0 :(得分:0)
为团队名称创建地图,如下所示:
r = {'K.Murray': 'ARI',
'M.Ryan': 'ATL',
'L.Jackson': 'BAL',
'J.Allen': 'BUF',
'K.Allen': 'CAR',
'M.Trubisky': 'CHI',
'A.Dalton': 'CIN',
'B.Mayfield': 'CLE',
'D.Prescott': 'DAL',
'D.Lock': 'DEN',
'D.Blough': 'DET',
'A.Rodgers': 'GRE',
'D.Watson': 'HOU',
'J.Brissett': 'IND',
'N.Foles': 'JAC',
'P.Mahomes': 'KAN',
'P.Rivers': 'LOS',
'J.Goff': 'LOS',
'R.Fitzpatrick': 'MIA',
'K.Cousins': 'MIN',
'T.Brady': 'NEP',
'D.Brees': 'NOS',
'D.Jones': 'NYG',
'S.Darnold': 'NYJ',
'D.Carr': 'OAK',
'C.Wentz': 'PHI',
'D.Hodges': 'PIT',
'J.Garoppolo': 'SAN',
'R.Wilson': 'SEA',
'J.Winston': 'TAM',
'R.Tannehill': 'TEN',
'D.Haskins': 'WAS'}
然后您可以像这样合并:
data['team_names'] = data.index.map(r)
输出:
epa success yards_gained attempts team_names
passer_player_name
L.Jackson 0.336 0.48 6.9 335 BAL
K.Cousins 0.295 0.50 7.1 363 MIN
P.Mahomes 0.285 0.50 7.4 368 KCC
我写了一个html剪贴器,建议我可以改变它来帮助您,它可以从https://fantasyfootballers.org/rb-running-back-nfl-stats/中获取所有紧急信息。只要#Look for table部分具有正确的'table'索引,这应该可以刮擦网站上的任何表,因为通常在您要获取的数据之前有一些表,因此可以在其他表上随意尝试网站。我用它来从维基百科上为您获取QB,而该行仅需为table = soup.find_all('table')[0]
import requests
import csv, re
from bs4 import BeautifulSoup
#Main function
def getNFLContent(link, filename):
#Request content
result1 = requests.get(link)
#Save source in var
src1 = result1.content
#Activate soup
soup = BeautifulSoup(src1,'lxml')
#Look for table
table = soup.find_all('table')[1]
#Save in csv
with open(filename,'w',newline='') as f:
writer = csv.writer(f)
for tr in table('tr'):
#print(tr)
row = [t.get_text(strip=True)for t in tr(['td','th'])]
writer.writerow(row)
def abrvname(x):
initial = x[0].capitalize()
lnamepat = r'(\w*?$)'
lname = re.search(lnamepat, x).groups()[0]
return initial + '.' + lname
link = 'https://fantasyfootballers.org/rb-running-back-nfl-stats/'
filename='rbs.csv'
getNFLContent(link, filename)
df = pd.read_csv('rbs.csv')
df.insert(loc=1, column='abr_name', value=df.Name.apply(abrvname))