Question

我想在创建数据库视图时获得一些帮助。我的数据库架构如下：

products            (id, ignored_comments_ids (array))
activities          (id)
comments            (id)
activities_comments (activity_id comment_id)
products_comments   (product_id, comment_id)
offers              (product_id, activity_id)

现在，我需要使用名为source的自定义列创建所有产品评论的视图：

source ='OFFER'：评论来自 products.offers.activities.comments关联
source ='直接'：来自products.comments协会的评论

此外，该视图还应排除来自products.ignored_comments_ids

我该怎么做？该视图必须具有product_id，source和comments表中的所有列。

我提出了以下看法，我该如何改善呢？

CREATE OR REPLACE VIEW all_comments AS
  WITH the_comments AS (
    SELECT
      comments.*,
      'OFFER'     AS source,
      products.id AS product_id
    FROM comments
    JOIN activities_comments ON activities_comments.comment_id = comments.id
    JOIN activities          ON activities.id = activities_comments.activity_id
    JOIN offers              ON offers.activity_id = activities.id
    JOIN products            ON products.id = offers.product_id
  UNION
    SELECT
      comments.*,
      'DIRECT'    AS source,
      products.id AS product_id
    FROM comments
    JOIN products_comments ON products_comments.comment_id = comments.id
    JOIN products          ON products.id = products_comments.product_id
  )
  SELECT DISTINCT ON (the_comments.id)
    the_comments.id,
    the_comments.name,
    the_comments.source,
    the_comments.product_id
  FROM the_comments
  JOIN products ON products.id = the_comments.product_id
  WHERE NOT to_json(products.ignored_comment_ids)::jsonb @> the_comments.id::jsonb
  ORDER BY the_comments.id;

Answer 1

UNION可用于合并2组数据，并且AND会同时删除重复的行。 UNION ALL可用于合并2组数据（然后停止）。因此，UNION ALL 避免了搜索和删除重复行的开销，因此速度更快。

在初始公用表表达式（cte）the_comments中，您强制并集的每一边使用不同的常量，例如

select *
from (
    select 1 as id, 'OFFER' AS source
    union
    select 1 as id, 'DIRECT' AS source
    ) d
;
result:
  id   source  
 ---- -------- 
   1   DIRECT  
   1   OFFER

即使id 1在该联合的两边，由于常量不同，该示例查询也会返回2行。因此，请改用UNION ALL。

尽管select *方便，但不应在视图中使用它（尽管两种方法都有参数，例如here）。也许这样做是为了简化问题，但我希望它不会像实际使用的那样被使用。如果视图的目的是仅返回4列，则仅指定这些列。

尽管您在输出中需要product_id，但是它可以来自offers.product_id或products_comments.product_id，因此您实际上不需要联接到products表。在cte之后也不需要连接到产品表。

由于我们现在正在使用UNION ALL，所以我看不到使用SELECT DISTINCT ON(...)有什么好处，我怀疑这只是可以消除的开销。显然，我无法验证这一点，这可能完全取决于您的功能要求。另外请注意，SELECT DISTINCT ON(...)将删除您已经仔细介绍的source，例如

select distinct on (id) id, source
from (
    select 1 as id, 'OFFER' AS source
    union
    select 1 as id, 'DIRECT'AS source
    ) d
;
result:
  id   source  
 ---- -------- 
   1   DIRECT

在任何视图中均不包括order by子句，仅订购“最终查询”。换一种说法;如果创建视图，则可能会在其他几个查询中使用它。这些查询中的每个查询都可能具有自己的where子句，并且需要不同的结果顺序。如果您订购视图，那么您只是在消耗CPU周期，然后在以后省去这些工作。因此，请删除order by子句。

我非常想为最终的where子句提出一种不同的方法，但是由于我对JSON的了解不多，因此我没有足够的经验来提出替代方案。但是，在where子句中对数据使用函数几乎总是导致性能下降的原因，最明显的原因是，它通常会删除对那些函数所涉及的列上的索引的访问。找到一种更有效的方法来排除注释异常可能会最大程度地改善您的查询性能。

因此，我的建议将导致以下情况：

WITH the_comments
AS (
    SELECT
        comments.id
      , comments.name
      , 'OFFER' AS source
      , offers.product_id AS product_id
    FROM comments
    JOIN activities_comments ON activities_comments.comment_id = comments.id
    JOIN activities ON activities.id = activities_comments.activity_id
    JOIN offers ON offers.activity_id = activities.id
    UNION ALL
    SELECT
        comments.id
      , comments.name
      , 'DIRECT' AS source
      , products_comments.product_id AS product_id
    FROM comments
    JOIN products_comments ON products_comments.comment_id = comments.id
    )
SELECT
    the_comments.id
  , the_comments.name
  , the_comments.source
  , the_comments.product_id
FROM the_comments
/* perhaps raise a separate question on this bit */
WHERE NOT to_json(products.ignored_comment_ids)::jsonb @> the_comments.id::jsonb

创建连接多个表的数据库视图

1 个答案: