Date Visitor V_PTS Home H_PTS \
0 2012-10-30 19:00:00 Washington Wizards 84 Cleveland Cavaliers 94
1 2012-10-30 19:30:00 Dallas Mavericks 99 Los Angeles Lakers 91
2 2012-10-30 20:00:00 Boston Celtics 107 Miami Heat 120
3 2012-10-31 19:00:00 Sacramento Kings 87 Chicago Bulls 93
4 2012-10-31 19:30:00 Houston Rockets 105 Detroit Pistons 96
尝试添加一个已删除的数据集来对NBA游戏参与度进行分析。我试图添加一些列,如竞技场和容量。这是我写的一个添加竞技场的功能。有一个更好的方法吗?我在日期时间有日期,所以如何正确地提取年份,为过去几年建立新竞技场的球队(萨克拉门托国王队)分配正确的竞技场。无论如何还要增加体育场的能力,一石二鸟,而不是创造另一个功能?
def label_arena (hometeam):
if hometeam == 'Toronto Raptors' :
return 'Air Canada Centre'
if hometeam == 'Miami Heat' :
return 'American Airlines Arena'
if hometeam == 'Dallas Mavericks' :
return 'American Airlines Center'
if hometeam == 'Orlando Magic' :
return 'Amway Center'
if hometeam == 'San Antonio Spurs' :
return 'AT&T Center'
if hometeam == 'Indiana Pacers' :
return 'Bankers Life Fieldhouse'
if hometeam == 'Brooklyn Nets' :
return 'Barclays Center'
if hometeam == 'Milwaukee Bucks' :
return 'Bradley Center'
if hometeam == 'Washington Wizards' :
return 'Capital One Arena'
if hometeam == 'Oklahoma City Thunder' :
return 'Chesapeake Energy Arena'
if hometeam == 'Memphis Grizzlies' :
return 'FedExForum'
if hometeam == 'Sacramento Kings' and df['Date'] < 2016:
return 'Sleep Train Arena'
if hometeam == 'Sacramento Kings' and df['Date'] > 2016:
return 'Golden 1 Center'
答案 0 :(得分:0)
您可以采取以下措施来简化逻辑:
import pandas as pd
df = pd.DataFrame({'Date': ['2012-10-30', '2012-10-30', '2012-10-30',
'2012-10-31', '2017-10-31'],
'Home': ['Toronto Raptors', 'Los Angeles Lakers', 'Miami Heat',
'Sacramento Kings', 'Sacramento Kings']})
df['Date'] = pd.to_datetime(df['Date'])
d = {'Toronto Raptors': 'Air Canada Centre',
'Los Angeles Lakers': 'Staples Center',
'Miami Heat': 'American Airlines Arena'}
# general criteria
df['Arena'] = df['Home'].map(d)
# custom criteria
df.loc[(df['Home'] == 'Sacramento Kings') &
(df['Date'].dt.year < 2016), 'Arena'] = 'Sleep Train Arena'
df.loc[(df['Home'] == 'Sacramento Kings') &
(df['Date'].dt.year >= 2016), 'Arena'] = 'Golden 1 Center'
print(df)
Date Home Arena
0 2012-10-30 Toronto Raptors Air Canada Centre
1 2012-10-30 Los Angeles Lakers Staples Center
2 2012-10-30 Miami Heat American Airlines Arena
3 2012-10-31 Sacramento Kings Sleep Train Arena
4 2017-10-31 Sacramento Kings Golden 1 Center
答案 1 :(得分:0)
import pandas as pd
home_arenas_capacities = pd.DataFrame([
['Toronto Raptors', 'Air Canada Centre', 20511],
['Miami Heat', 'American Airlines Arena', 19600],
...
])
df.merge(home_arenas_capacities, on='Home')
对于萨克拉门托国王队,你想在“主页”和“日期”&gt;上合并。 2016年,可能要求您制作临时列,然后df.merge(..., on=['Home','Date_GE_2016'])
并删除“Date_GE_2016”列。
但更简洁的方法是添加一个专栏'Season'='2015-16','2016-17'。随着您的数据库变得越来越大,您似乎需要它。 (对于游戏数据库,您可以从“日期”值自动提取“季节”。对于'home_arenas_capacities'数据框,您需要手动编辑它。
答案 2 :(得分:0)
以下是使用numpy.select
的方法,如果您不反对numpy
:
import numpy as np
conditions = [
df['Home'] == 'Toronto Raptors',
df['Home'] == 'Miami Heat',
df['Home'] == 'Dallas Mavericks',
...
(df['Home'] == 'Sacramento Kings') & (df['Date'].dt.year < 2016),
(df['Home'] == 'Sacramento Kings') & (df['Date'].dt.year > 2016)]
choices = [
'Air Canada Centre',
'American Airlines Arena',
'American Airlines Center',
...
'Sleep Train Arena',
'Golden 1 Center']
df['arena'] = np.select(conditions, choices)
请注意,要使df['Date']
条件生效,您需要将df['Date']
设置为日期时间系列(如果您尚未完成,可以通过df['Date'] = pd.to_datetime(df['Date'])
执行此操作所以)