在PostgreSQL

时间:2016-01-24 19:59:12

标签: python sql postgresql python-2.7 pandas

我正在尝试完成一项有趣的研究,并为现有数据集添加新功能

ID      DATE           LEAGUE         HOME        AWAY    H_GOALS A_GOALS PREV_H_GOALS PREV_A_GOALS
9911 "2005-01-01" "Bundesliga 1" "Wolfsburg"  "Schalke 04"    2      1          NULL      NULL
9822 "2005-01-01" "Jupiler League" "Beveren" "Lokeren"        2      1          NULL      NULL
9823 "2005-01-01" "Jupiler League" "Waregem" "Westerlo"       2      3          NULL      NULL
9824 "2005-01-10" "Jupiler League" "Westerlo" "Beveren"       4      1           3         2
9932 "2005-01-10" "Bundesliga 1" "Bayern Munich" "Wolfsburg"  2      0          NULL       2
9933 "2005-01-10" "Bundesliga 1" "Ein Frankfurt" "Schalke 04" 0      1          NULL       1

现在,我想添加其他列,例如

  • MATCH_GOALS - 在上一场比赛
  • 中踢出了进球
  • PREV_RESULT - 上一场比赛的结果[1胜2平2负3]
  • ALL_GOALS_IN_SEASON - 本赛季球队的总进球数

我对案例#1的尝试不正确:

SELECT ROW_NUMBER() OVER (ORDER BY start_time DESC) AS rowno,
       LEAD(m.match_score) OVER (ORDER BY start_time DESC) as MATCH_GOALS, -- #1

FROM match as m
WHERE (m.home = m.home OR m.away = m.away) #fail
ORDER BY start_time DESC

您能建议适当的窗口函数(PostgreSQL)或修复示例吗?

1 个答案:

答案 0 :(得分:1)

我在SQL Server中测试了一个解决方案,但我发现PostgreSQL支持SQL Server中的所有好东西(CTE,Window函数等),所以这应该是一个好的开始。

您的数据结构对于直接窗口功能设备非常不友好,因此CTE以更容易处理的方式获取它:

ID, Date, League, Team, Goals

这允许获取先前的信息,因为团队存在于单个列中。

此外,首先CTE计算团队是否会导致该特定比赛。

第二个CTE实际上使用LAG窗口函数获取有关上一个匹配的信息。

;with PivCTE AS (
    SELECT ID, Date, League, Home AS Team, HGoals AS Goals, 
        (CASE WHEN HGoals > AGoals THEN 1 WHEN HGoals = AGoals THEN 2 ELSE 3 END) AS Result,
        SUM(HGoals) OVER (PARTITION BY Home ORDER BY (SELECT 1)) AS AllGoals
    FROM Match 
    UNION
    SELECT ID, Date, League, Away AS Team, AGoals AS Goals,
        (CASE WHEN HGoals < AGoals THEN 1 WHEN HGoals = AGoals THEN 2 ELSE 3 END) AS Result,
        SUM(AGoals) OVER (PARTITION BY Away ORDER BY (SELECT 1)) AS AllGoals 
    FROM Match
),
PrevData AS (
    SELECT ID, Date, League, Team, 
        LAG(Goals, 1, NULL) OVER (PARTITION BY Team ORDER BY Date) AS PrevGoals,
        LAG(Result, 1, NULL) OVER (PARTITION BY Team ORDER BY Date) AS PrevResult,
        AllGoals
    FROM PivCTE
)
select M.*, 
    PH.PrevGoals AS HomePrevGoals, PA.PrevGoals AS AwayPrevGoals, 
    PH.PrevResult AS HomePrevWin, PA.PrevResult AS AwayPrevWin,
    PH.AllGoals AS HomeAllGoals, PA.AllGoals AS AwayAllGoals
FROM Match M
    JOIN PrevData PH ON PH.ID = M.ID AND PH.Team = M.Home
    JOIN PrevData PA ON PA.ID = M.ID AND PA.Team = M.Away
ORDER BY M.Date DESC

设置数据:

create table Match (
    ID INT NOT NULL,
    Date DATE NOT NULL,
    League NVARCHAR(100) NOT NULL,
    Home NVARCHAR(100) NOT NULL,
    Away NVARCHAR(100) NOT NULL,
    HGoals INT NOT NULL,
    AGoals INT NOT NULL
)

insert into Match values 

(9911, '2005-01-01', 'Bundesliga 1', 'Wolfsburg',  'Schalke 04',    2,      1),
(9822, '2005-01-01', 'Jupiler League', 'Beveren', 'Lokeren',        2,      1),
(9823, '2005-01-01', 'Jupiler League', 'Waregem', 'Westerlo',       2,      3),
(9824, '2005-01-10', 'Jupiler League', 'Westerlo', 'Beveren',       4,      1),
(9932, '2005-01-10', 'Bundesliga 1', 'Bayern Munich', 'Wolfsburg',  2,      0),
(9933, '2005-01-10', 'Bundesliga 1', 'Ein Frankfurt', 'Schalke 04', 0 ,     1)