从多列创建新的pandas列

时间:2018-06-14 11:48:23

标签: python pandas function dataframe lambda

这是数据框

    MatchId EventCodeId EventCode   Team1   Team2   Team1_Goals Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime
0   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  457040
1   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  1405394
2   865314  2053    Goal Away   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  1898705
3   865314  2053    Goal Away   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4388278
4   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4507898
5   865314  1030    Cancel Goal Home    Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4517728
6   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4956346
7   865314  1030    Cancel Goal Home    Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4960633
8   865316  2053    Goal Away   Coventry    Bradford    0   0   1.0847662440468118  1.2526705617472387  447858
9   865316  2054    Cancel Goal Away    Coventry    Bradford    0   0   1.0847662440468118  1.2526705617472387  456361

新列将按如下方式创建:

for EventCodeId = 1029 and EventCode = Goal Home
new_col1 = CurrentPlaytime/3*10**4

for EventCodeId = 2053 and ventCode = Goal Away
new_col2 = CurrentPlaytime/3*10**4

对于其他所有EventCodeIdEventCode new_co1new_col2将采用0.

这是我如何开始,但无法继续下去。请帮忙

new_col1 = []
new_col2 = []
def timeslot(EventCodeId, EventCode, CurrentPlaytime):
    if x == 1029 and y == 'Goal Home':
        new.Col1.append(z/(3*10**4))
    elif x == 2053 and y == 'Goal Away':
        new_col2.append(z/(3*10**4))
    else:
        new_col1.append(0)
        new_col2.append(0)
    return new_col1
    return new_col2



df1['new_col1', 'new_col2'] = df1.apply(lambda x,y,z: timeslot(x['EventCodeId'], y['EventCode'], z['CurrentPlaytime']), axis=1)  

TypeError: ("<lambda>() missing 2 required positional arguments: 'y' and 'z'", 'occurred at index 0')

1 个答案:

答案 0 :(得分:2)

您不需要显式循环。尽可能使用矢量化操作。

使用numpy.where

s = df1['CurrentPlaytime']/3*10**4

mask1 = (df1['EventCodeId'] == 1029) & (df1['EventCode'] == 'Goal')
mask2 = (df1['EventCodeId'] == 2053) & (df1['EventCode'] == 'Away')

df1['new_col1'] = np.where(mask1, s, 0)
df1['new_col2'] = np.where(mask2, s, 0)