pandasql:计算成对出现

时间:2018-11-23 20:45:06

标签: sql sqldf pandasql

我试图计算A和B参加过的比赛次数, 数据集如下所示:

This is how the data looks like in Notebook

所以第1队和第29队参加的比赛数是2,因为它们分别是HomeTeam和AwayTeam,但是,使用我的查询,我只能将其算为一个:

SELECT HomeTeamID, AwayTeamID, Count(*) AS num_matches
FROM games GROUP BY HomeTeamID, AwayTeamID

我知道我的问题在哪里,但不知道如何解决。

2 个答案:

答案 0 :(得分:0)

按共同的顺序安排团队,以便分组对他们是在家还是不在不敏感。

SELECT GREATEST(HomeTeamID, AwayTeamID) AS team1, LEAST(HomeTeamID, AwayTeamID) AS team2, COUNT(*) as num_matches
FROM games
GROUP BY team1, team2

答案 1 :(得分:0)

修订后的答案

假设您有以下数据:

Sample Data

然后,您可以简单地将所有团队的列表与所有游戏的列表一起加入:

SELECT
    teams.TeamID,
    CASE WHEN teams.TeamID = games.HomeTeamID THEN games.AwayTeamID ELSE games.HomeTeamID END AS OtherTeamID,
    COUNT(*) AS GamesBetween
FROM (
    SELECT HomeTeamID AS TeamID FROM games
    UNION
    SELECT AwayTeamID FROM games
) AS teams
INNER JOIN games ON teams.TeamID = games.HomeTeamID OR teams.TeamID = games.AwayTeamID
GROUP BY
    teams.TeamID,
    CASE WHEN teams.TeamID = games.HomeTeamID THEN games.AwayTeamID ELSE games.HomeTeamID END

要获得如下结果:

Grouped by team 1 and team 2

或者这(留作练习):

Grouped by team 1