我写了一个程序来生成测试,这些测试由大量问题中的问题组合而成。每个测试都有许多标准,只有满足这些标准,程序才将它们保存到数据库中。
我编写的程序是为了确保尽可能分配问题,即在生成问题组合时,算法会优先考虑池中的问题,这些问题在之前的迭代中被询问的次数最少。
我创建了一个表test_questions
,基本上为每个测试存储test_id
,另一个test_questions
存储test_id
及其对应的question_id
s每次测试使用n行(其中n是每次测试中的问题数)。
现在我将测试存储在数据库中,我想检查不同测试对之间的问题重叠是否在一定的范围内,我认为我应该能够使用SQL来做到这一点。
使用自联接,我能够使用此查询来选择测试3和测试5的常见问题:
-- Get the number of questions that are common to tests 3 and 5
SELECT count(tq1.question_id) AS Overlap
FROM test_questions AS tq1
JOIN test_questions AS tq2
ON tq1.question_id = tq2.question_id
WHERE tq1.test_id = 5
AND tq2.test_id = 3;
我能够从前n(5)个测试中生成每个可能的测试对组合:
-- Get all combinations of pairs of tests from 1 to 5
SELECT t1.test_id AS Test1, t2.test_id AS Test2
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
WHERE t1.test_id <= 5
AND t2.test_id <= 5;
我想做的但到目前为止未能做的是将上述两个查询结合起来,以显示前5个测试的每个可能的对组合 - 以及两个测试共有的问题数。 / p>
-- This doesn't work
SELECT t1.test_id AS Test1, t2.test_id AS Test2, count(tq1.question_id) AS Overlap
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
JOIN test_questions AS tq1
ON t1.test_id = tq1.test_id
JOIN test_questions AS tq2
ON t2.test_id = tq2.test_id
WHERE t1.test_id <= 11
AND t2.test_id <= 11
GROUP BY t1.test_id, t2.test_id;
我已在此SQL Fiddle
创建了两个表格的简化版本(包含随机数据)注意:我使用MySQL作为我的DBMS,但SQL应该与ANSI标准兼容。
编辑:我编写的用于生成测试的程序实际生成的数量超过了我需要的测试数量,我只想比较前n个测试。在示例中,我添加了<= 5
WHERE条件以忽略额外的测试。
根据Thorsten Kettner的示例数据澄清我正在寻找的内容:
test 1: a, b and c
test 2: a, b and d
test 3: d, e and f
结果将是:
Test Test Overlap
Test1 Test2 2 (a and b in common)
Test1 Test3 0 (no questions in common)
Test2 Test3 1 (d is common to both)
答案 0 :(得分:2)
您只需要group by
第一个查询(基本上)。我还添加了另一个条件,因此测试ID按顺序生成:
SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1 LEFT JOIN
test_questions tq2
ON tq1.question_id = tq2.question_id and
tq1.test_id < tq2.test_id
GROUP BY tq1.test_id, tq2.test_id;
这是标准的SQL。
如果你想获得所有对的测试,即使那些没有共同问题的测试,这是另一种方法:
SELECT t1.test_id as test_id1, t2.test_id as test_id2, count(tq2.question_id) AS Overlap
FROM tests t1 CROSS JOIN
tests t2 LEFT JOIN
test_questions tq1
on t1.test_id = tq1.test_id LEFT JOIN
test_questions tq2
ON t2.test_id = tq2.test_id and tq1.question_id = tq2.question_id
GROUP BY t1.test_id, t2.test_id;
这假设您有一个表,每个测试一行。如果没有,请将tests
替换为(select distinct test from test_questions)
。
答案 1 :(得分:1)
select test_combinations.t1_test_id, test_combinations.t2_test_id, count(q2.question_id) from ( select t1.test_id as t1_test_id, t2.test_id as t2_test_id from (select test_id from tests where test_id t1.test_id ) test_combinations inner join test_questions q1 on q1.test_id = test_combinations.t1_test_id left join test_questions q2 on q2.test_id = test_combinations.t2_test_id and q2.question_id = q1.question_id group by test_combinations.t1_test_id, test_combinations.t2_test_id order by test_combinations.t1_test_id, test_combinations.t2_test_id;
我已经添加了一个没有重叠问题的测试,并删除了对test_id&lt; = 5的限制,因此您会看到没有重叠问题的测试对:http://sqlfiddle.com/#!2/e83aa/1
答案 2 :(得分:0)
我修改了戈登的答案,此查询提供了测试组合的列表及其相应的重叠(共同的问题):
SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1
JOIN test_questions tq2
ON tq1.question_id = tq2.question_id
AND tq1.test_id < tq2.test_id
WHERE tq1.test_id <= 5
AND tq2.test_id <= 5
GROUP BY tq1.test_id, tq2.test_id;