这是数据框
MatchId EventCodeId EventCode Team1 Team2 Team1_Goals Team2_Goals xG_Team1 xG_Team2 CurrentPlaytime
0 865314 1029 Goal Home Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 457040
1 865314 1029 Goal Home Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 1405394
2 865314 2053 Goal Away Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 1898705
3 865314 2053 Goal Away Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 4388278
4 865314 1029 Goal Home Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 4507898
5 865314 1030 Cancel Goal Home Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 4517728
6 865314 1029 Goal Home Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 4956346
7 865314 1030 Cancel Goal Home Northampton Crawley Town 2 2 2.067663207769023 0.8130662505484256 4960633
8 865316 2053 Goal Away Coventry Bradford 0 0 1.0847662440468118 1.2526705617472387 447858
9 865316 2054 Cancel Goal Away Coventry Bradford 0 0 1.0847662440468118 1.2526705617472387 456361
新列将按如下方式创建:
for EventCodeId = 1029 and EventCode = Goal Home
new_col1 = CurrentPlaytime/3*10**4
for EventCodeId = 2053 and ventCode = Goal Away
new_col2 = CurrentPlaytime/3*10**4
对于其他所有EventCodeId
和EventCode
new_co1
和new_col2
将采用0.
这是我如何开始,但无法继续下去。请帮忙
new_col1 = []
new_col2 = []
def timeslot(EventCodeId, EventCode, CurrentPlaytime):
if x == 1029 and y == 'Goal Home':
new.Col1.append(z/(3*10**4))
elif x == 2053 and y == 'Goal Away':
new_col2.append(z/(3*10**4))
else:
new_col1.append(0)
new_col2.append(0)
return new_col1
return new_col2
df1['new_col1', 'new_col2'] = df1.apply(lambda x,y,z: timeslot(x['EventCodeId'], y['EventCode'], z['CurrentPlaytime']), axis=1)
TypeError: ("<lambda>() missing 2 required positional arguments: 'y' and 'z'", 'occurred at index 0')
答案 0 :(得分:2)
您不需要显式循环。尽可能使用矢量化操作。
使用numpy.where
:
s = df1['CurrentPlaytime']/3*10**4
mask1 = (df1['EventCodeId'] == 1029) & (df1['EventCode'] == 'Goal')
mask2 = (df1['EventCodeId'] == 2053) & (df1['EventCode'] == 'Away')
df1['new_col1'] = np.where(mask1, s, 0)
df1['new_col2'] = np.where(mask2, s, 0)