因此,我能够自己解决此问题,但感觉自己以一种极其低效的方式完成了任务。我希望有人可能能够提供替代解决方案,因为这不是理想的方法。
我拥有自2009赛季以来每场NFL比赛的数据。该数据集包括一个用于比赛日期的列,但不包括用于季节的列,因此我想创建一个。有时NFL在1月有比赛,所以我不能简单地根据年份来计算。
这是我想出的极其低效的解决方案:
# Create list of season years
season_years = [2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]
# Initialize dictionary of seasons
seasons = {}
# Iterate over season years to add start and end dates to seasons dictionary
# Used Mar 1 and Feb 28 as start and end dates due to Super Bowl being played in early Feb every year
for year in season_years:
seasons[year] = {'start': str(year) + '-03-01','end': str(year + 1) + '-02-28'}
# Turn seasons dictionary into dataframe
seasons_df = pd.DataFrame(seasons).transpose()
# Convert start and end dates in dataframe to datetime objects
seasons_df['start'] = pd.to_datetime(seasons_df['start'])
seasons_df['end'] = pd.to_datetime(seasons_df['end'])
# Initialize new column 'season' with None values
data['season'] = None
# Iterate over season years, add year to season column if game date is between start and end for that season
for year in season_years:
data.loc[pd.to_datetime(data['game_date']).between(seasons_df.loc[year,'start'],seasons_df.loc[year,'end']),'season'] = year
所以这行得通,但是有点麻烦,我必须遍历Python列表才能创建新列。必须有更好的方法。
编辑:可以从kaggle此处下载数据:https://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016/version/6?
答案 0 :(得分:0)
您可以使用pandas.date_range
来生成季节的边界,然后使用pandas.cut
来将每个游戏日期分配给一个季节:
bins = pd.date_range('2009-03-01', periods=10, freq=pd.offsets.DateOffset(years=1))
bins = pd.Series(bins, index=bins.year)
data['season'] = pd.cut(df['game_date'], bins, labels=bins.index[:-1]).astype(int)
其中bins
如下所示:
# print bins
2009 2009-03-01
2010 2010-03-01
2011 2011-03-01
2012 2012-03-01
2013 2013-03-01
2014 2014-03-01
2015 2015-03-01
2016 2016-03-01
2017 2017-03-01
2018 2018-03-01
dtype: datetime64[ns]
一组随机游戏日期的结果:
# print data.sample(10).sort_values('game_date')
game_date season
77 2010-03-19 2010
177 2010-06-27 2010
547 2011-07-02 2011
720 2011-12-22 2011
775 2012-02-15 2011
847 2012-04-27 2012
888 2012-06-07 2012
1636 2014-06-25 2014
1696 2014-08-24 2014
2010 2015-07-04 2015