在postgres中的外部联接后过滤空值

时间:2017-09-05 16:28:29

标签: postgresql

我目前正试图弄清楚如何使用涉及空值的左连接进行过滤。这是一个简化的 我正在处理的架构版本:

CREATE TABLE bookclubs (
    bookclub_id UUID NOT NULL PRIMARY KEY
);

CREATE TABLE books (
    bookclub_id UUID NOT NULL,
    book_id UUID NOT NULL
);
ALTER TABLE books ADD CONSTRAINT books_pk PRIMARY KEY(bookclub_id, book_id);
ALTER TABLE books ADD CONSTRAINT book_to_bookclub FOREIGN KEY(bookclub_id)
  REFERENCES bookclubs(bookclub_id) ON UPDATE NO ACTION ON DELETE CASCADE;
CREATE INDEX books_bookclub_index ON books (bookclub_id);

CREATE TABLE book_reviews (
    bookclub_id UUID NOT NULL,
    book_id UUID NOT NULL,
    reviewer_id TEXT NOT NULL,
    rating int8 NOT NULL
);
ALTER TABLE book_reviews ADD CONSTRAINT book_reviews_pk PRIMARY KEY(bookclub_id, book_id, reviewer_id);
ALTER TABLE book_reviews ADD CONSTRAINT book_review_to_book FOREIGN KEY(bookclub_id,book_id)
  REFERENCES books(bookclub_id,book_id) ON UPDATE NO ACTION ON DELETE CASCADE;
CREATE INDEX book_review_to_book_index ON book_reviews ( bookclub_id, book_id);
CREATE INDEX book_review_by_reviewer ON book_reviews ( bookclub_id, reviewer_id, rating);

我想要一个查询,对于给定的bookclub_idreviewer_id,我会将所有他们评为> = 3的书籍归还给我,或者他们没有评分。他们没有被评级的图书在book_reviews表中没有条目,这是我无能为力的事情。 rating实际上是一个枚举,如果它是相关的,但我不认为它。

我做这件事的第一次尝试失败了:

SELECT *
FROM   books
       LEFT OUTER JOIN book_reviews
                    ON ( ( ( books.bookclub_id = book_reviews.bookclub_id )
                           AND ( books.book_id = book_reviews.book_id ) )
                         AND ( book_reviews.reviewer_id = 'alice' ) )
WHERE  books.bookclub_id = '00000000-0000-0000-0000-000000000000'
       AND book_reviews.rating != 1
       AND book_reviews.rating != 2;

这会删除那些没有来自用户的评论的书籍,这在我考虑WHERE条件如何实际实施后会有所帮助。这是查询计划

Nested Loop  (cost=0.30..16.39 rows=1 width=104)
  ->  Index Scan using book_reviews_pk on book_reviews  (cost=0.15..8.21 rows=1 width=72)
        Index Cond: ((bookclub_id = '00000000-0000-0000-0000-000000000000'::uuid) AND (reviewer_id = 'alice'::text))
        Filter: ((rating <> 1) AND (rating <> 2))
  ->  Index Only Scan using books_pk on books  (cost=0.15..8.17 rows=1 width=32)
        Index Cond: ((bookclub_id = '00000000-0000-0000-0000-000000000000'::uuid) AND (book_id = book_reviews.book_id))

所以我添加了一个null的显式检查:

SELECT *
FROM   books
       LEFT OUTER JOIN book_reviews
                    ON ( ( ( books.bookclub_id = book_reviews.bookclub_id )
                           AND ( books.book_id = book_reviews.book_id ) )
                         AND ( book_reviews.reviewer_id = 'alice' ) )
WHERE  books.bookclub_id = '00000000-0000-0000-0000-000000000000'
       AND book_reviews.rating IS NULL
       OR ( book_reviews.rating != 1
          AND book_reviews.rating != 2);

这会返回正确的结果,但看起来非常低效,并且会使数据库停止运行。这是查询计划

Hash Left Join  (cost=18.75..52.56 rows=1346 width=104)
   Hash Cond: ((books.bookclub_id = book_reviews.bookclub_id) AND (books.book_id = book_reviews.book_id))
   Filter: (((books.bookclub_id = '00000000-0000-0000-0000-000000000000'::uuid) AND (book_reviews.rating IS NULL)) OR ((book_reviews.rating <> 1) AND (book_reviews.rating <> 2)))
   ->  Seq Scan on books  (cost=0.00..23.60 rows=1360 width=32)
   ->  Hash  (cost=18.69..18.69 rows=4 width=72)
         ->  Bitmap Heap Scan on book_reviews  (cost=10.23..18.69 rows=4 width=72)
               Recheck Cond: (reviewer_id = 'alice'::text)
               ->  Bitmap Index Scan on book_review_by_reviewer  (cost=0.00..10.22 rows=4 width=0)
                     Index Cond: (reviewer_id = 'alice'::text)

我没有解释这些事情的专家,但Filter移到外面似乎很糟糕。有没有一种有效的方法来构造查询,以便我可以得到我想要的结果?感谢

2 个答案:

答案 0 :(得分:0)

将过滤器移动到连接条件:

SELECT *
FROM
    books
    LEFT OUTER JOIN
    book_reviews ON 
        books.bookclub_id = book_reviews.bookclub_id 
        AND books.book_id = book_reviews.book_id 
        AND book_reviews.reviewer_id = 'alice'
        AND book_reviews.rating != 1
        AND book_reviews.rating != 2
WHERE books.bookclub_id = '00000000-0000-0000-0000-000000000000'

或者更短一些:

AND book_reviews.rating not in (1, 2)

答案 1 :(得分:0)

我相信我们已经明白了。我们在IBar子句中遗漏了一组parens:

WHERE

没有它,布尔逻辑关联错误。此查询返回正确的结果并具有合理的查询计划,因此看起来这是整个问题。谢谢你的期待。