我有一个网球比赛结果数据集如下:
tennis_cols = ['Year','TourNo','MatchNo','Round','Winner','Loser']
tennis_rslts = [ [2018, 1, 1, 'QF', 'PlayerA', 'PlayerB']
,[2018, 1, 2, 'QF', 'PlayerC', 'PlayerD']
,[2018, 1, 3, 'QF', 'PlayerE', 'PlayerF']
,[2018, 1, 4, 'QF', 'PlayerG', 'PlayerH']
,[2018, 1, 5, 'SF', 'PlayerA', 'PlayerC']
,[2018, 1, 6, 'SF', 'PlayerE', 'PlayerG']
,[2018, 1, 7, 'F', 'PlayerA', 'PlayerE'] ]
dfTennis=pd.DataFrame(tennis_rslts,columns=tennis_cols)
dfTennis
Year TourNo MatchNo Round Winner Loser
0 2018 1 1 QF PlayerA PlayerB
1 2018 1 2 QF PlayerC PlayerD
2 2018 1 3 QF PlayerE PlayerF
3 2018 1 4 QF PlayerG PlayerH
4 2018 1 5 SF PlayerA PlayerC
5 2018 1 6 SF PlayerE PlayerG
6 2018 1 7 F PlayerA PlayerE
我想添加一个列WinsToDate,其中包含此匹配的获胜者在当前比赛之前所拥有的胜利数,即:
Year TourNo MatchNo Round Winner Loser WinsToDate
0 2018 1 1 QF PlayerA PlayerB 0
1 2018 1 2 QF PlayerC PlayerD 0
2 2018 1 3 QF PlayerE PlayerF 0
3 2018 1 4 QF PlayerG PlayerH 0
4 2018 1 5 SF PlayerA PlayerC 1 <-- PlayerA won MatchNo 1
5 2018 1 6 SF PlayerE PlayerG 1 <-- PlayerE won MatchNo 3
6 2018 1 7 F PlayerA PlayerE 2 <-- PlayerA won MatchNo 1 and 5
我的真实世界数据集足够大,迭代数据集的速度太慢。我是如何以有效的方式实现结果的?
基本上我想计算Winner与正在处理的行匹配的行数,并且MatchNo小于正在处理的当前行。
**更新** 我可以使用以下方法计算获胜者在Dataframe中出现的次数:
dfTennis['Count'] = list(map(lambda x : len(dfTennis[(dfTennis['Winner'] == x)]), dfTennis['Winner']))
但这会计算所有事件,而不是当前行之前的所有事件。
答案 0 :(得分:0)
奇怪的是,我将回答我自己的问题。
计算WinsToDate列所需的代码是:
dfTennis['WinsToDate'] = list(map(lambda x : len(dfTennis[(dfTennis['Winner'] == dfTennis.iloc[x]['Winner']) &
(dfTennis['MatchNo'] < dfTennis.iloc[x]['MatchNo'])]), dfTennis.index.values))
通过将索引值传递给lambda函数,这意味着我可以访问Winner和MatchNo字段中的数据以应用我需要的逻辑。
欢迎听到任何更好的解决方案,但这似乎符合我的需要。