Question

我有许多表遵循这种相当常见的模式：A <-->> B。我想在表 A 中找到匹配的行对，其中某些列具有相等的值，并且还在 B 中引用行，其中某些列具有相等的值。换句话说， A 中的一对（R，S）匹配，iff用于给定的列 {a ₁ ， A 中的₂，...，_n} 和 {b ₁ ，b ₂，...，b _n} in B ：

我们有 Ra ₁ = Sa ₁ ， Ra ₂ = Sa ₂ ，...， Ra _n = Sa _n的
对于 B 中的每个 R 的引用行 T ，存在 S 的引用行 B st中的 Tb ₁ = Ub ₁ ， Tb ₂ = Ub ₂ ，...， Tb _n = Ub _n < / em>的

（R，S）匹配iff （S，R）匹配。

（我对关系代数不太熟悉，所以我上面的定义可能不符合任何惯例。）

我提出的方法是：

查找具有匹配列的（R，S）对。

查看是否有相同数量的（任何） R 和 S 引用 B 中的行。

对于 B 中的每一行，找到匹配的行，按 A 中的引用行进行分组并计数。检查匹配行的数量与引用行的数量相同。

但是，我在（下面）为第2步和第3步编写的查询，以查找 B 中的匹配行，这是非常复杂的。有更好的解决方案吗？

-- Tables similar to those that I have. CREATE TABLE a ( id INTEGER PRIMARY KEY, data TEXT ); CREATE TABLE b ( id INTEGER PRIMARY KEY, a_id INTEGER REFERENCES a (id), data TEXT ); SELECT DISTINCT dup.lhs_parent_id, dup.rhs_parent_id FROM ( SELECT DISTINCT MIN(lhs.a_id, rhs.a_id) AS lhs_parent_id, -- Normalize. MAX(lhs.a_id, rhs.a_id) AS rhs_parent_id, COUNT(*) AS count FROM b lhs INNER JOIN b rhs USING (data) WHERE NOT (lhs.id = rhs.id OR lhs.a_id = rhs.a_id) -- Remove self-matching rows and duplicate values with the same parent. GROUP BY lhs.a_id, rhs.a_id ) dup INNER JOIN ( -- Check that lhs has the same number of rows. SELECT a_id AS parent_id, COUNT(*) AS count FROM b GROUP BY a_id ) lhs_ct ON ( dup.lhs_parent_id = lhs_ct.parent_id AND dup.count = lhs_ct.count ) INNER JOIN ( -- Check that rhs has the same number of rows. SELECT a_id AS parent_id, COUNT(*) AS count FROM b GROUP BY a_id ) rhs_ct ON ( dup.rhs_parent_id = rhs_ct.parent_id AND dup.count = rhs_ct.count ); -- Test data. -- Expected query result is three rows with values (1, 2), (1, 3) and (2, 3) for a_id, -- since the first three rows (with values 'row 1', 'row 2' and 'row 3') -- have referencing rows, each of which has a matching pair. The fourth row -- ('row 3') only has one referencing row with the value 'foo', so it doesn't have a -- pair for the referenced rows with the value 'bar'. INSERT INTO a (id, data) VALUES (1, 'row 1'), (2, 'row 2'), (3, 'row 3'), (4, 'row 4'); INSERT INTO b (id, a_id, data) VALUES (1, 1, 'foo'), (2, 1, 'bar'), (3, 2, 'foo'), (4, 2, 'bar'), (5, 3, 'foo'), (6, 3, 'bar'), (7, 4, 'foo');

我正在使用SQLite。

Answer 1

要查找匹配和不同的行，更容易使用INTERSECT和MINUS操作然后加入......

但是当比较JOIN解决方案中只有一个实际使用的字段看起来更好时：

Select B1.A_Id, B2.A_Id
From (
    Select Data, A_Id, Count(Id) A_Count
    From B
    Group By Data, A_Id
) b1 
inner join (
    Select Data, A_Id, Count(Id) a_count
    From B Group By Data, A_Id
) b2 on b1.data = b2.data and b1.a_count = b2.a_count and b1.a_id <> b2.a_id

据我了解，您需要找出具有相同数据和数据计数的不同a_id对。

我的脚本的结果给出了两个方向上的可能耦合，为SQLlite特定语法留下了优化空间。

结果示例： {1,2}，{1,3}，{2,1}，{2,3}，{3,2}，{3,1}

测试引用行的相等性

1 个答案: