给出pandas数据帧中列的不同值,如何从多行/列创建列表?

时间:2018-06-14 16:13:22

标签: pandas function dataframe

这是一个来自大型数据框的文章。

ss = {'EventCode': pd.Series(['Goal Away', 'Goal Away', 'Goal Home', 'Goal Away','Goal Home', 'Goal Home', 'Cancel Goal Home', 'Goal Home','Goal Home', 'Goal Away', 'Goal Away', 'Goal Home','Goal Away', 'Goal Home', 'Goal Away', 'Goal Home']),
'Team1_Goal': pd.Series([2,2,2,2,2,0,0,5,5,5,5,5,5,5,5,5]),
'Team2_Goal': pd.Series([3,3,3,3,3,3,0,0,4,4,4,4,4]),
'xG_Team1': pd.Series([1.44344827512893,1.44344827512893,1.44344827512893,1.44344827512893,1.44344827512893,2.665637391386118,2.665637391386118,1.1554900289157282,1.1554900289157282,1.1554900289157282,1.1554900289157282,1.1554900289157282,1.1554900289157282,1.1554900289157282,1.1554900289157282,1.1554900289157282]),
'xG_Team2': pd.Series([1.5713173919057721,1.5713173919057721,1.5713173919057721,1.5713173919057721,1.5713173919057721,0.5207680077479664,0.5207680077479664,1.7456786951765073,1.7456786951765073,1.7456786951765073,1.7456786951765073,1.7456786951765073,1.7456786951765073,1.7456786951765073,1.7456786951765073,1.7456786951765073]),
'new_col1': pd.Series([0,0,179,0,190,123,0,29,75,0,0,118,0,143,0,190]),
'new_col2':pd.Series([100,163,0,181,0,0,0,0,0,97,112,0,140,0,186,0])}

df = pd.DataFrame(ss)

我有一个从xG_Team1和xG_Team2(配对)获取单个值的函数。这很有效。

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]

# Constants
total_timeslot = 180
m = 1
k = 180
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal

def sum_squared_diff(x1, x2, x3, y):
    ssd = []
    for k in range(total_timeslot):  # k will take multiple values
        if k in Home_Goal:
            ssd.append(sum((x2 - y) ** 2))
        elif k in Away_Goal:
            ssd.append(sum((x3 - y) ** 2))
        else:
            ssd.append(sum((x1 - y) ** 2))
    return ssd

def my_function(row):
    xG_Team1 = row.xG_Team1
    xG_Team2 = row.xG_Team2
    return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])

results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)

results

问题是上述功能仅在Home和Away_Goal为零或空列表时有效。 我想分别从new_col1new_col2为同一配对xG_Team1xG_Team2分配主页和离开目标的值。 xG_Team1 = 1.44344827512893 and xG_Team2 = 1.5713173919057721用于上述功能。

例如Home_goal =[179, 190], Away_Goal = [100, 163, 181]

dict

非常感谢任何帮助

1 个答案:

答案 0 :(得分:1)

你可以这样做:

SockJS reconnect

结果是包含df['new_col'] = df['new_col1'] + df['new_col2'] result = df.groupby(['xG_Team1','xG_Team2','EventCode'])['new_col'].apply(list).reset_index() 列的新数据框,其中包含new_colGoal AwayGoal Home

列表

输出:

xG_Team