查询以检查每个可能的对组合之间的共性(交叉点)

时间:2014-03-21 11:31:59

标签: mysql sql combinations set-intersection

我写了一个程序来生成测试,这些测试由大量问题中的问题组合而成。每个测试都有许多标准,只有满足这些标准,程序才将它们保存到数据库中。

我编写的程序是为了确保尽可能分配问题,即在生成问题组合时,算法会优先考虑池中的问题,这些问题在之前的迭代中被询问的次数最少。

我创建了一个表test_questions,基本上为每个测试存储test_id,另一个test_questions存储test_id及其对应的question_id s每次测试使用n行(其中n是每次测试中的问题数)。

现在我将测试存储在数据库中,我想检查不同测试对之间的问题重叠是否在一定的范围内,我认为我应该能够使用SQL来做到这一点。

使用自联接,我能够使用此查询来选择测试3和测试5的常见问题:

-- Get the number of questions that are common to tests 3 and 5
SELECT count(tq1.question_id) AS Overlap
FROM test_questions AS tq1
JOIN test_questions AS tq2
ON tq1.question_id = tq2.question_id
WHERE tq1.test_id = 5
AND tq2.test_id = 3;

我能够从前n(5)个测试中生成每个可能的测试对组合:

-- Get all combinations of pairs of tests from 1 to 5
SELECT t1.test_id AS Test1, t2.test_id AS Test2
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
WHERE t1.test_id <= 5
AND t2.test_id <= 5;

我想做的但到目前为止未能做的是将上述两个查询结合起来,以显示前5个测试的每个可能的对组合 - 以及两个测试共有的问题数。 / p>

-- This doesn't work
SELECT t1.test_id AS Test1, t2.test_id AS Test2, count(tq1.question_id) AS Overlap
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
JOIN test_questions AS tq1
ON t1.test_id = tq1.test_id
JOIN test_questions AS tq2
ON t2.test_id = tq2.test_id
WHERE t1.test_id <= 11
AND t2.test_id <= 11
GROUP BY t1.test_id, t2.test_id;

我已在此SQL Fiddle

创建了两个表格的简化版本(包含随机数据)

注意:我使用MySQL作为我的DBMS,但SQL应该与ANSI标准兼容。

编辑:我编写的用于生成测试的程序实际生成的数量超过了我需要的测试数量,我只想比较前n个测试。在示例中,我添加了<= 5 WHERE条件以忽略额外的测试。

根据Thorsten Kettner的示例数据澄清我正在寻找的内容:

test 1: a, b and c
test 2: a, b and d
test 3: d, e and f

结果将是:

Test    Test    Overlap
Test1   Test2   2       (a and b in common)
Test1   Test3   0       (no questions in common)
Test2   Test3   1       (d is common to both)

3 个答案:

答案 0 :(得分:2)

您只需要group by第一个查询(基本上)。我还添加了另一个条件,因此测试ID按顺序生成:

SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1 LEFT JOIN
     test_questions tq2
     ON tq1.question_id = tq2.question_id and
        tq1.test_id < tq2.test_id
GROUP BY tq1.test_id, tq2.test_id;

这是标准的SQL。

如果你想获得所有对的测试,即使那些没有共同问题的测试,这是另一种方法:

SELECT t1.test_id as test_id1, t2.test_id as test_id2, count(tq2.question_id) AS Overlap
FROM tests t1 CROSS JOIN
     tests t2 LEFT JOIN
     test_questions tq1
     on t1.test_id = tq1.test_id LEFT JOIN
     test_questions tq2
     ON t2.test_id = tq2.test_id and tq1.question_id = tq2.question_id 
GROUP BY t1.test_id, t2.test_id;

这假设您有一个表,每个测试一行。如果没有,请将tests替换为(select distinct test from test_questions)

答案 1 :(得分:1)

  • 第一步:找到所有测试组合,例如:1-2,1-3,2-3
  • 第二步:加入第一次测试的所有问题。
  • 第三步:外部加入第二次测试的相同问题(如果存在)。
  • 最后一步:计算每个测试组合中找到的相同问题。
    select test_combinations.t1_test_id, test_combinations.t2_test_id, count(q2.question_id)
    from
    (
        select t1.test_id as t1_test_id, t2.test_id as t2_test_id
        from (select test_id from tests where test_id  t1.test_id
    ) test_combinations
    inner join test_questions q1 on q1.test_id = test_combinations.t1_test_id
    left join test_questions q2 on q2.test_id = test_combinations.t2_test_id and q2.question_id = q1.question_id
    group by test_combinations.t1_test_id, test_combinations.t2_test_id
    order by test_combinations.t1_test_id, test_combinations.t2_test_id;

我已经添加了一个没有重叠问题的测试,并删除了对test_id&lt; = 5的限制,因此您会看到没有重叠问题的测试对:http://sqlfiddle.com/#!2/e83aa/1

答案 2 :(得分:0)

我修改了戈登的答案,此查询提供了测试组合的列表及其相应的重叠(共同的问题):

SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1
JOIN test_questions tq2
ON tq1.question_id = tq2.question_id
AND tq1.test_id < tq2.test_id 
WHERE tq1.test_id <= 5
AND tq2.test_id <= 5
GROUP BY tq1.test_id, tq2.test_id;