Question

比较这两个查询。将过滤器放在连接条件或are子句中是否更快。我一直认为它在连接标准上更快，因为它在最快的时刻减少了结果集，但我不确定。

我打算进行一些测试，但我也希望得到更清晰易读的意见。

查询1

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
        ON  a.ID = x.TableAID
INNER JOIN  TableB b
        ON  x.TableBID = b.ID
WHERE       a.ID = 1            /* <-- Filter here? */

查询2

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
        ON  a.ID = x.TableAID
        AND a.ID = 1            /* <-- Or filter here? */
INNER JOIN  TableB b
        ON  x.TableBID = b.ID

修改

我运行了一些测试，结果显示它实际上非常接近，但WHERE子句实际上稍快一些！ =）

我绝对同意将过滤器应用于WHERE子句更有意义，我只是对性能影响感到好奇。

标准的时间： 143016 ms
ELAPSED TIME JOIN CRITERIA： 143256 ms

TEST

SET NOCOUNT ON;

DECLARE @num    INT,
        @iter   INT

SELECT  @num    = 1000, -- Number of records in TableA and TableB, the cross table is populated with a CROSS JOIN from A to B
        @iter   = 1000  -- Number of select iterations to perform

DECLARE @a TABLE (
        id INT
)

DECLARE @b TABLE (
        id INT
)

DECLARE @x TABLE (
        aid INT,
        bid INT
)

DECLARE @num_curr INT
SELECT  @num_curr = 1

WHILE (@num_curr <= @num)
BEGIN
    INSERT @a (id) SELECT @num_curr
    INSERT @b (id) SELECT @num_curr

    SELECT @num_curr = @num_curr + 1
END

INSERT      @x (aid, bid)
SELECT      a.id,
            b.id
FROM        @a a
CROSS JOIN  @b b

/*
    TEST
*/
DECLARE @begin_where    DATETIME,
        @end_where      DATETIME,
        @count_where    INT,
        @begin_join     DATETIME,
        @end_join       DATETIME,
        @count_join     INT,
        @curr           INT,
        @aid            INT

DECLARE @temp TABLE (
        curr    INT,
        aid     INT,
        bid     INT
)

DELETE FROM @temp

SELECT  @curr   = 0,
        @aid    = 50

SELECT  @begin_where = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
    INSERT      @temp (curr, aid, bid)
    SELECT      @curr,
                aid,
                bid
    FROM        @a a
    INNER JOIN  @x x
            ON  a.id = x.aid
    INNER JOIN  @b b
            ON  x.bid = b.id
    WHERE       a.id = @aid

    SELECT @curr = @curr + 1
END
SELECT  @end_where = CURRENT_TIMESTAMP

SELECT  @count_where = COUNT(1) FROM @temp
DELETE FROM @temp

SELECT  @curr = 0
SELECT  @begin_join = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
    INSERT      @temp (curr, aid, bid)
    SELECT      @curr,
                aid,
                bid
    FROM        @a a
    INNER JOIN  @x x
            ON  a.id = x.aid
            AND a.id = @aid
    INNER JOIN  @b b
            ON  x.bid = b.id

    SELECT @curr = @curr + 1
END
SELECT  @end_join = CURRENT_TIMESTAMP

SELECT  @count_join = COUNT(1) FROM @temp
DELETE FROM @temp

SELECT  @count_where AS count_where,
        @count_join AS count_join,
        DATEDIFF(millisecond, @begin_where, @end_where) AS elapsed_where,
        DATEDIFF(millisecond, @begin_join, @end_join) AS elapsed_join

Answer 1

在性能方面，它们是相同的（并制定相同的计划）

逻辑上，如果将INNER JOIN替换为LEFT JOIN，则应该使操作仍然有意义。

在你的情况下，这将是这样的：

SELECT  *
FROM    TableA a
LEFT JOIN
        TableXRef x
ON      x.TableAID = a.ID
        AND a.ID = 1
LEFT JOIN
        TableB b
ON      x.TableBID = b.ID

或者这个：

SELECT  *
FROM    TableA a
LEFT JOIN
        TableXRef x
ON      x.TableAID = a.ID
LEFT JOIN
        TableB b
ON      b.id = x.TableBID
WHERE   a.id = 1

以前的查询不会返回除a.id以外的1的任何实际匹配项，因此后一种语法（使用WHERE）在逻辑上更加一致。

Answer 2

对于内部联接，您在何处放置标准并不重要。 SQL编译器将两者转换为执行计划，其中过滤发生在连接下方（即，好像过滤器表达式出现在连接条件中）。

外连接是另一回事，因为过滤器的位置会改变查询的语义。

Answer 3

就这两种方法而言。

JOIN / ON用于连接表格
用于过滤结果的地方

虽然你可以用不同的方式使用它，但对我来说似乎总是一种气味。

在出现问题时处理效果。然后你可以看看这样的“优化”。

Answer 4

任何带有分数的查询优化器......都是相同的。

Answer 5

我猜第一个，因为它对数据进行了更具体的过滤。但是你should see the execution plan和任何优化一样，因为它可能会因数据，服务器硬件等的大小而有所不同。

Answer 6

更快吗？试试吧，看看。

哪个更容易阅读？对我而言，第一个看起来更“正确”，因为移动条件与连接无关。

Answer 7

此联接的位置确实不太可能是性能的决定因素。我并不熟悉tsql的执行计划，但很可能它们会自动优化到类似的计划。

Answer 8

规则＃0：运行一些基准，看看！真正告诉哪个更快的唯一方法就是尝试它。使用SQL分析器非常容易执行这些类型的基准测试。

此外，检查使用JOIN和WHERE子句编写的查询的执行计划，以查看突出的差异。

最后，正如其他人所说，这两个应该由任何体面的优化器相同地处理，包括内置于SQL Server中的优化器。

Answer 9

在postgresql中，它们是相同的。我们之所以知道这一点，是因为如果您对每个查询执行explain analyze，则计划将是相同的。举个例子：

# explain analyze select e.* from event e join result r on e.id = r.event_id and r.team_2_score=24;

                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.045..0.047 rows=1 loops=1)
   Hash Cond: (e.id = r.event_id)
   ->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.009..0.010 rows=2 loops=1)
   ->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.017..0.017 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.008 rows=1 loops=1)
               Filter: (team_2_score = 24)
               Rows Removed by Filter: 1
 Planning time: 0.182 ms
 Execution time: 0.101 ms
(10 rows)

# explain analyze select e.* from event e join result r on e.id = r.event_id where r.team_2_score=24;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.027..0.029 rows=1 loops=1)
   Hash Cond: (e.id = r.event_id)
   ->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.010..0.011 rows=2 loops=1)
   ->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.010..0.010 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.007 rows=1 loops=1)
               Filter: (team_2_score = 24)
               Rows Removed by Filter: 1
 Planning time: 0.140 ms
 Execution time: 0.058 ms
(10 rows)

它们都有相同的最低和最高成本以及相同的查询计划。另外，请注意，即使在顶部查询中，team_score_2也会被应用为“过滤器”。

哪个SQL查询更快？过滤加入条件或Where子句？

9 个答案: