我有一个类似于下表的链接交易表
+----+----+----+
| # | A | B |
+----+----+----+
| 1 | 1 | 4 |
| 2 | 3 | 5 |
| 3 | 4 | 6 |
| 4 | 5 | 8 |
| 5 | 6 | 1 |
| 6 | 7 | 7 |
| 7 | 8 | 3 |
| 8 | 9 | 3 |
| 9 | 10 | 4 |
| 10 | 11 | 14 |
| 11 | 2 | 2 |
| 12 | 12 | 4 |
| 13 | 13 | 14 |
| 14 | 14 | 9 |
| 15 | 15 | 1 |
+----+----+----+
A列和B列下的数字代表交易ID。因此,例如,交易1通过某些标准与交易4相关联,tran 3与tran 5相关联,tran 4与tran 6相关联等等。
交易2和7未与任何其他交易相关联,因此它们是自我链接的。
我想要提取的是此表中的交易系列 - 由于tran 1和4是链接的,tran 4和6是链接的,tran 10和4是链接的等等它们属于一个交易家族 - (1,4,6 ,10,12,15)。
我想创建具有最低事务ID的事务系列,即主事务。 理想情况下,输出看起来像这样
+----+------+--------------+
| # | Tran | Master_tran |
+----+------+--------------+
| 1 | 1 | 1 |
| 2 | 3 | 3 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 1 |
| 6 | 7 | 7 |
| 7 | 8 | 3 |
| 8 | 9 | 3 |
| 9 | 10 | 1 |
| 10 | 11 | 3 |
| 11 | 2 | 2 |
| 12 | 12 | 1 |
| 13 | 13 | 3 |
| 14 | 14 | 3 |
| 15 | 15 | 1 |
+----+------+----+
我一直在玩自我加入。
SELECT t1.a as x,
least (min(t1.b), min(t2.a)) as y
FROM test t1
LEFT JOIN test t2 on t2.b = t1.a
GROUP BY t1.a
ORDER BY t1.a asc
此代码提供以下outupt
+------+----+---+
| Col1 | X | Y |
+------+----+---+
| 1 | 1 | 4 |
| 2 | 2 | 2 |
| 3 | 3 | 5 |
| 4 | 4 | 1 |
| 5 | 5 | 3 |
| 6 | 6 | 1 |
| 7 | 7 | 7 |
| 8 | 8 | 3 |
| 9 | 9 | 3 |
| 10 | 10 | |
| 11 | 11 | |
| 12 | 12 | |
| 13 | 13 | |
| 14 | 14 | 9 |
| 15 | 15 | |
+------+----+---+
我不确定我的代码有什么问题。有人能指出我正确的方向吗? 谢谢!
答案 0 :(得分:0)
原则上你需要一个CONNECT BY语句来解决这样的分层问题。 虽然你有循环循环,你还需要一个NOCYCLE子句,这将消除循环中的最后一个链接,这很好,因为该链接永远不会成为答案的一部分。 你也有两个方向的链接(f.e.(13,14)和(14,9)),所以你必须小心将它包含在你的查询中(两次!)。
WITH t_order
AS (SELECT qt.qt_id, qt.qt_a, qt.qt_b, LEAST( qt.qt_a, qt.qt_b ) AS t_parent, GREATEST( qt.qt_a, qt.qt_b ) AS t_child
FROM query_test qt
UNION
SELECT qb.qt_id, qb.qt_a, qb.qt_b, GREATEST( qb.qt_a, qb.qt_b ) AS t_parent, LEAST( qb.qt_a, qb.qt_b ) AS t_child
FROM query_test qb)
, hier
AS (SELECT ps.qt_id
, ps.qt_a
, ps.qt_b
, t_parent
, t_child
, LEVEL
, CONNECT_BY_ROOT t_parent AS prev_tran
FROM t_order ps
CONNECT BY NOCYCLE PRIOR t_child = t_parent)
SELECT hr.qt_id, hr.qt_a, MIN( hr.prev_tran ) AS master_tran
FROM hier hr
GROUP BY hr.qt_id, hr.qt_a
ORDER BY hr.qt_id, hr.qt_a;
这将解决您的问题,但如果必须处理这些100.000记录,则可能会变得非常慢。如果您需要将此方法与许多其他列组合,那么SQL语句也很难理解。为此,您应该将所有qt.qt列分解出来并在最后一次选择中加入它们。
WITH t_order
AS (SELECT DISTINCT tran, root_tran
FROM (SELECT LEAST( qt.qt_a, qt.qt_b ) AS tran, GREATEST( qt.qt_a, qt.qt_b ) AS root_tran
FROM query_test qt
UNION
SELECT GREATEST( qb.qt_a, qb.qt_b ) AS tran, LEAST( qb.qt_a, qb.qt_b ) AS root_tran
FROM query_test qb))
, hier
AS (SELECT DISTINCT tran, root_tran
FROM (SELECT tran, CONNECT_BY_ROOT root_tran AS root_tran
FROM t_order
CONNECT BY NOCYCLE PRIOR tran = root_tran)
WHERE tran >= root_tran)
SELECT qt.qt_id
, qt.qt_a
, MIN( LEAST( h1.root_tran, h2.root_tran ) ) AS master_tran
FROM query_test qt
INNER JOIN hier h1 ON qt.qt_a = h1.tran
INNER JOIN hier h2 ON qt.qt_b = h2.tran
GROUP BY qt.qt_id, qt.qt_a
ORDER BY qt.qt_id, qt.qt_a;
我无法测试最后一句话。
答案 1 :(得分:0)
我可能已经创造了其他解决方案
您也可以将链接加倍,而不是使用CONNECT BY语句,并在需要时随时加倍。
检索所有链接的查询保持不变,但后面跟着一个简单的查询,用两个链接的所有不同组合替换原始链接。
包括由tran_a和tran_b组成的链接,您有2 + 1 + 2个链接,因此您可以找到最多5个链接的路径。
如果这是短的,你在前一个子查询下插入一个相同的子查询,现在它是4 + 1 + 4使9个链接长。
如您所见,每个添加的子查询的最大路径长度加倍,而性能成本仅略高。
首先查询您的演示数据:
WITH double_0
AS (SELECT DISTINCT root_tran, tran
FROM ( SELECT LEAST( td_0.tran_a, td_0.tran_b ) AS root_tran
, GREATEST( td_0.tran_a, td_0.tran_b ) AS tran
FROM tran_demo td_0
UNION
SELECT GREATEST( qb.tran_a, qb.tran_b ) AS root_tran
, LEAST( qb.tran_a, qb.tran_b ) AS tran
FROM tran_demo qb ))
, double_1
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_0 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
SELECT td_1.td_id
, td_1.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_1
INNER JOIN double_1 d1 ON td_1.tran_a = d1.tran
INNER JOIN double_1 d2 ON td_1.tran_b = d2.tran
GROUP BY td_1.td_id, td_1.tran_a
ORDER BY td_1.td_id, td_1.tran_a;
然后你如何修改:
请注意,您现在在最终查询中查询 double_2 。
WITH double_0
AS (SELECT DISTINCT root_tran, tran
FROM ( SELECT LEAST( td_0.tran_a, td_0.tran_b ) AS root_tran
, GREATEST( td_0.tran_a, td_0.tran_b ) AS tran
FROM tran_demo td_0
UNION
SELECT GREATEST( qb.tran_a, qb.tran_b ) AS root_tran
, LEAST( qb.tran_a, qb.tran_b ) AS tran
FROM tran_demo qb ))
, double_1
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_0 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
, double_2
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_1 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
SELECT td_1.td_id
, td_1.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_1
INNER JOIN double_2 d1 ON td_1.tran_a = d1.tran
INNER JOIN double_2 d2 ON td_1.tran_b = d2.tran
GROUP BY td_1.td_id, td_1.tran_a
ORDER BY td_1.td_id, td_1.tran_a;
最后一个查询来检查您使用的路径长度是否仍然足够: 您已添加下一级别并减去当前级别 只要此查询没有返回任何行,当前查询就是正确的。
WITH double_0
AS (SELECT DISTINCT root_tran, tran
FROM ( SELECT LEAST( td_0.tran_a, td_0.tran_b ) AS root_tran
, GREATEST( td_0.tran_a, td_0.tran_b ) AS tran
FROM tran_demo td_0
UNION
SELECT GREATEST( qb.tran_a, qb.tran_b ) AS root_tran
, LEAST( qb.tran_a, qb.tran_b ) AS tran
FROM tran_demo qb ))
, double_1
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_0 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
, double_2
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_1 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
SELECT td_1.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_1
INNER JOIN double_2 d1 ON td_1.tran_a = d1.tran
INNER JOIN double_2 d2 ON td_1.tran_b = d2.tran
GROUP BY td_1.tran_a
MINUS
SELECT td_2.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_2
INNER JOIN double_1 d1 ON td_2.tran_a = d1.tran
INNER JOIN double_1 d2 ON td_2.tran_b = d2.tran
GROUP BY td_2.tran_a
ORDER BY tran_a;
性能测试你必须自己做。
我很乐观,而子查询很便宜,每次有效路径长度加倍。
迟早这应该比以前的解决方案更快。
顺便说一下,关于对原始链接进行排序的说法也适用于此!
如果有效,请标记我的答案。