从pandas dataframe创建列表

时间:2018-06-18 12:41:33

标签: list dataframe pandas-groupby

我有一个函数,它接受所有非独特的MatchId和(xG_Team1 vs xG_Team2,配对)并给出一个数组的输出。然后总结为sse常数。

该函数的问题是它遍历每一行,复制MatchId。 我想停止此操作。

对于每个不同的MatchId,我需要相应的主场和客场目标作为列表。即每次迭代都会使用Home_GoalAway_Goal。来自数据框的Home_Goal_timeAway_Goal_time列。 以下列表似乎不起作用。

MatchId Event_Id   EventCode        Team1        Team2      Team1_Goals
0   842079  2053    Goal Away    Huachipato  Cobresal       0
1   842079  2053    Goal Away    Huachipato  Cobresal       0
2   842080  1029    Goal Home      Slovan    lava           3
3   842080  1029    Goal Home      Slovan    lava           3
4   842080  2053    Goal Away      Slovan    lava           3
5   842080  1029    Goal Home      Slovan    lava           3
6   842634  2053    Goal Away      Rosario   Boca Juniors   0
7   842634  2053    Goal Away      Rosario   Boca Juniors   0
8   842634  2053    Goal Away      Rosario   Boca Juniors   0
9   842634  2054  Cancel Goal Away Rosario   Boca Juniors   0

    Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime  Home_Goal_Time Away_Goal_Time
0   2       1.79907     1.19893     2616183         0       87
1   2       1.79907     1.19893     3436780         0       115
2   1       1.70662     1.1995      3630545         121     0
3   1       1.70662     1.1995      4769519         159     0
4   1       1.70662     1.1995      5057143         0       169
5   1       1.70662     1.1995      5236213         175     0
6   2       0.82058     1.3465      2102264         0       70
7   2       0.82058     1.3465      4255871         0       142
8   2       0.82058     1.3465      5266652         0       176
9   2       0.82058     1.3465      5273611         0       0

例如MatchId = 842079, Home_goal =[], Away_Goal = [87, 115]

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196 
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal

def sum_squared_diff(x1, x2, x3, y):
    ssd = []
    for k in range(total_timeslot):  # k will take multiple values
        if k in Home_Goal:
            ssd.append(sum((x2 - y) ** 2))
        elif k in Away_Goal:
            ssd.append(sum((x3 - y) ** 2))
        else:
            ssd.append(sum((x1 - y) ** 2))
    return ssd

def my_function(row):
    xG_Team1 = row.xG_Team1
    xG_Team2 = row.xG_Team2
    return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])

results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)

results
sum(results.sum())

对于上述三场比赛,欲望结果应如下所示。 如果我需要一个人sse, sum(sum_squared_diff(x1, x2, x3, y))给我以下内容。

MatchId =  842079   =  3.984053038520635
MatchId =  842080   =  7.882189570700502
MatchId =  842080   =  5.929085973050213

考虑到原始数据的大小,实际上我是在sse的总和之后。对于上面的示例数据,只需将值相加即可得到total sse= 17.79532858227135。一旦我实现了这一点,那么我将尝试通过更新任意值m来优化基于此图的sse。

这是我希望函数迭代的列表。

Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series

然后将其转换为列表。

Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
 list

 Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]


Away_Goal 
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]

但该功能仍将Home_GoalAway_Goal作为空列表。

1 个答案:

答案 0 :(得分:1)

如果您只想一次考虑一个MatchId,则应.groupby('MatchID')首先

df.groupby('MatchID').apply(...)