希望在CTE上删除一个丑陋的自我加入

时间:2018-01-01 18:39:15

标签: sql sql-server-2008 common-table-expression self-join

我有一个创建排序字典的查询(排序方式是有一个增量value来标识密钥的相对位置)

我希望知道,对于每一行,key是否在字典中的任何其他行中作为CROSS APPLY存在。我在WITH dictionary([id], [key], [val]) AS ( SELECT 1, 'a', 'b' UNION ALL SELECT 2, 'b', 'c' UNION ALL SELECT 3, 'c', 'a' UNION ALL SELECT 4, 'x', 'w' UNION ALL SELECT 5, 'y', 'x' UNION ALL SELECT 6, 'z', 'y' ) SELECT * FROM dictionary dict CROSS APPLY ( SELECT COUNT(*) FROM dictionary WHERE dictionary.id > dict.id AND dictionary.[key] = dict.[val] ) lookup(hits) CROSS APPLY ( SELECT 1, 3 WHERE lookup.hits = 0 UNION ALL SELECT 1, 2 WHERE lookup.hits > 0 UNION ALL SELECT 2, 3 WHERE lookup.hits > 0 ) map([from], [to]) -- [key]s 'c', 'x', 'y' and 'z' should only have one output rows -- It's "acceptable" for only 'z' to have just one output row IFF a self join can be avoided 中使用相关查询来执行此操作。有效地加入CTE。

据我了解,这意味着代表字典的CTE必须计算两次?

除了使用表变量(它在函数内部)之外,是否有人有任何其他建议?

  dictionary   dict
LEFT JOIN
(
  SELECT key, MAX(id) AS id FROM dictionary GROUP BY key
)
  lookup
    ON  lookup.key = dict.value
    AND lookup.id  > dict.id

我能想到的其他选择是自我加入的所有变体......

  dictionary   dict
OUTER APPLY
(
  SELECT 1 WHERE EXISTS (SELECT * FROM dictionary WHERE dictionary.id > dict.id AND dictionary.key = dict.value)
)
  lookup(hits)

或者...

lookup.id > dict.id

然而,我试图避免CTE的自我加入,可能还有我没想到的窗口函数?任何只是为了避免CTE被计算两次......

(忽略{{1}}方面没问题,如果这意味着避免自我加入......)

编辑: 添加了更完整的示例,还有一个SQL小提琴,感谢@MartinSmith指出了一些不一致...

http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/17407

1 个答案:

答案 0 :(得分:1)

这是一种可以使用窗口函数的方法。

首先取消行的输出,以便键和值变为通用terms,然后使用MAX ... OVER (PARTITION BY term)查找该术语用作键的最高行的ID。

在此示例中,它然后设置一个标志并丢弃由unpivoting添加的重复行(保留该对中的context = 'v'行,因为这是具有该标志所需信息的那一行)。

然后,您可以使用它连接到包含map值的表值构造函数。

WITH dictionary(id, [key], value)
     AS (
            SELECT 1, 'a', 'b' 
  UNION ALL SELECT 2, 'b', 'c'
  UNION ALL SELECT 3, 'c', 'a'
  UNION ALL SELECT 4, 'x', 'w'
  UNION ALL SELECT 5, 'y', 'x'
  UNION ALL SELECT 6, 'z', 'y'   
     ),
     t1
     AS (SELECT dict.*,
                context,
                highest_id_where_term_is_key = MAX(CASE
                                                     WHEN context = 'k'
                                                       THEN v.id
                                                   END) OVER (PARTITION BY term)
         FROM   dictionary dict
                CROSS APPLY (VALUES(id, [key], 'k'),
                                   (id, value, 'v')) v(id, term, context)),
     t2
     AS (SELECT *,
                val_in_later_key = CASE
                                     WHEN id < highest_id_where_term_is_key
                                       THEN 1
                                     ELSE 0
                                   END
         FROM   t1
         WHERE  context = 'v' 
         -- Discard duplicate row from the unpivot - only want the "value" row
        )
SELECT id,
       [key],
       value,
       highest_id_where_term_is_key,
       map.[from],
       map.[to]
FROM   t2
       JOIN (VALUES (1, 3, 0),
                    (1, 2, 1),
                    (2, 3, 1) ) map([from], [to], [flg])
         ON map.flg = t2.val_in_later_key
ORDER  BY id 

返回

+----+-----+-------+------------------------------+------+----+
| id | key | value | highest_id_where_term_is_key | from | to |
+----+-----+-------+------------------------------+------+----+
|  1 | a   | b     | 2                            |    1 |  2 |
|  1 | a   | b     | 2                            |    2 |  3 |
|  2 | b   | c     | 3                            |    1 |  2 |
|  2 | b   | c     | 3                            |    2 |  3 |
|  3 | c   | a     | 1                            |    1 |  3 |
|  4 | x   | w     | NULL                         |    1 |  3 |
|  5 | y   | x     | 4                            |    1 |  3 |
|  6 | z   | y     | 5                            |    1 |  3 |
+----+-----+-------+------------------------------+------+----+