Postgres:多对多连接创建双输出

时间:2020-12-30 18:39:02

标签: sql postgresql

我最近在我的一个查询中添加了多对多 JOIN 以添加“标记”功能。多对多工作很好,但是,它现在导致查询的先前工作部分输出记录两次。

          SELECT v.* 
          FROM "Server" AS s 
            JOIN "Vote" AS v ON (s.id = v."serverId")
            JOIN "_ServerToTag" st ON (s.id = st."A") 
          OFFSET 0 LIMIT 25;
 id  |        createdAt        | authorId | serverId 
-----+-------------------------+----------+----------
 190 | 2020-12-23 15:47:25.476 |     6667 |        3
 190 | 2020-12-23 15:47:25.476 |     6667 |        3
 194 | 2020-12-21 15:47:25.476 |     6667 |        3
 194 | 2020-12-21 15:47:25.476 |     6667 |        3

在上面的例子中:

  • Server 是我的主表,其中包含一堆条目。将其视为 Reddit 帖子,它们具有标题、内容并使用 Vote 表来计算“赞成票”。
 id |             title             
----+-------------------------------
  3 | test server 3
  • Votes 是一个非常简单的表,它包含“upvote”的时间戳、创建它的人以及分配给它的 Server.id
  • _ServerToTag 是一个包含两列AB 的表。它将 Server 连接到另一个包含 Tags 的表。
 A | B 
---+---
 3 | 1
 3 | 2

以上是一个大大简化的查询,实际上,我正在sum查询查询结果以获得 number 总票数。

期望的结果是结果不会重复:

 id  |        createdAt        | authorId | serverId 
-----+-------------------------+----------+----------
 190 | 2020-12-23 15:47:25.476 |     6667 |        3
 194 | 2020-12-21 15:47:25.476 |     6667 |        3

我真的不确定为什么会发生这种情况,所以我完全不知道如何解决它。

任何帮助将不胜感激。

编辑:

如果我想查询 DISTINCT 表,

Vote 可以工作。但不是在更复杂的查询中。就我而言,它看起来更像这样:

SELECT s.id, s.title, sum(case WHEN v."createdAt" >= '2020-12-01' AND v."createdAt" < '2021-01-01'
          THEN 1 ELSE 0 END ) AS "voteCount", 
          FROM "Server" AS s 
            LEFT JOIN "Vote" AS v ON (s.id = "serverId")
            LEFT JOIN "_ServerToTag" st ON (s.id = st."A");
 id |             title             | voteCount 
----+-------------------------------+-----------
  3 | test server 3                 |         4

在上面,我只需要 voteCount 列是 DISTINCT。

SELECT s.id, s.title, sum(DISTINCT case WHEN v."createdAt" >= '2020-12-01' AND v."createdAt" < '2021-01-01'
          THEN 1 ELSE 0 END ) AS "voteCount", 
          FROM "Server" AS s 
            LEFT JOIN "Vote" AS v ON (s.id = "serverId")
            LEFT JOIN "_ServerToTag" st ON (s.id = st."A");
 id |             title             | voteCount 
----+-------------------------------+-----------
  3 | test server 3                 |         1

以上几种作品,但是好像有多个也只能算一票。

2 个答案:

答案 0 :(得分:0)

问题似乎是您将联接添加到 _ServerToTag。由于 _ServerToTag 中的每一行都有 Server 中的多行,因此查询为每个服务器返回多行,_ServerToTag 中的每个匹配行都返回一个行。

似乎 _ServerToTag 已添加到查询中,因此它将仅包含具有标签的服务器。如果这是您的意图,您可以使用:

SELECT v.id, v.authorId, v.serverId, COUNT(DISTINCT v.createdAt) AS TOTAL_VOTES
  FROM "Server" AS s 
  INNER JOIN "Vote" AS v
    ON s.id = v."serverId"
  INNER JOIN (SELECT DISTINCT "A" FROM "_ServerToTag") st
    ON s.id = st."A"
  WHERE v."createdAt" >= '2020-12-01' AND
        v."createdAt" < '2021-01-01'
  GROUP BY v.id, v.authorId, v.serverId
  OFFSET 0 LIMIT 25

SELECT v.id, v.authorId, v.serverId, COUNT(DISTINCT v.createdAt) AS TOTAL_VOTES
  FROM "Server" AS s 
  INNER JOIN "Vote" AS v
    ON s.id = v."serverId"
  WHERE v."createdAt" >= '2020-12-01' AND
        v."createdAt" < '2021-01-01' AND
        s.id IN (SELECT "A" FROM "_ServerToTag")
  GROUP BY v.id, v.authorId, v.serverId
  OFFSET 0 LIMIT 25

这可能会更好地传达查询的意图。

编辑

如果您希望能够对没有投票的条目进行计数,您需要使用外连接来拉入(可能不存在的)投票,然后使用 CASE 表达式仅计算存在的投票:

SELECT s.id, v.id, v.authorId, v.serverId,
       CASE
         WHEN v.id IS NULL THEN 0
         ELSE COUNT(DISTINCT v.createdAt)
       END AS TOTAL_VOTES
  FROM "Server" AS s 
  LEFT OUTER JOIN "Vote" AS v
    ON s.id = v."serverId"
  WHERE v."createdAt" >= '2020-12-01' AND
        v."createdAt" < '2021-01-01' AND
        s.id IN (SELECT "A" FROM "_ServerToTag")
  GROUP BY s.id, v.id, v.authorId, v.serverId
  OFFSET 0 LIMIT 25

您可能实际上并不需要它 - 您可能可以逃脱

SELECT s.id, v.id, v.authorId, v.serverId,
       COUNT(DISTINCT v.createdAt) AS TOTAL_VOTES
  FROM "Server" AS s 
  LEFT OUTER JOIN "Vote" AS v
    ON s.id = v."serverId"
  WHERE v."createdAt" >= '2020-12-01' AND
        v."createdAt" < '2021-01-01' AND
        s.id IN (SELECT "A" FROM "_ServerToTag")
  GROUP BY s.id, v.id, v.authorId, v.serverId
  OFFSET 0 LIMIT 25

答案 1 :(得分:0)

好吧,在我收到的答案无法真正解决我的问题后,我去找朋友寻求帮助。

我认为我的查询过于复杂和令人困惑,我被建议使用子查询来降低复杂性和易于管理。

我的查询现在看起来像这样:

SELECT 
    s.id
,   s.title
,   COALESCE(v."VOTES", 0) AS "voteCount" 
FROM "Server" AS s
    -- Join tags
    INNER JOIN
    (
        SELECT
            st."A"
        ,   json_agg(
                        json_build_object(
                            'id', 
                            t.id, 
                            'tagName', 
                            t."tagName"
                        )
                    ) as "tagsArray"
        FROM
            "_ServerToTag" AS st
        INNER JOIN
            "Tag" AS t
        ON
            t.id = st."B"
        GROUP BY
            st."A"
    ) AS tag
    ON
        tag."A" = s.id
    -- Count votes
    LEFT JOIN 
    (
        SELECT
            "serverId"
        ,   COUNT(*) AS "VOTES" 
        FROM 
            "Vote" as v 
        WHERE 
            v."createdAt" >= '2020-12-01' AND 
            v."createdAt" <  '2021-01-01' 
        GROUP BY "serverId"
    ) as v 
    ON 
        s.id = v."serverId"
    OFFSET 0 LIMIT 25;

这完全相同,但通过直接在联接中选择我需要的内容,它更具可读性,而且我可以更好地控制返回的数据。