SQL:如何生成不存在的组合

时间:2018-10-23 17:08:02

标签: sql sql-server

我想通过两个单词的组合来创建一个庞大的列表。字典中的每个单词都应与所有其他单词组合。换句话说,我将有Total^2个组合。每个单词都有一个唯一的ID,我只在组合表中产生两个ID的组合。

我想定期检查是否缺少任何组合,以便生成并添加到数据库中。我发现this Q/A会生成所有可能的组合,但我不知道如何使用SQL查询来查找不存在的组合,例如:

select * from words a ... join words b 
where (a.id, b.id) not in (select * from combinaions) 

如果SQL对此没有直接解决方案,请您提出一种算法来以编程方式执行此操作。请注意,由于我删除了一些单词,所以可能缺少一些 ID ,所以我不能对整数使用线性循环。

组合表具有两列(第一个ID,第二个ID),两个ID均来自表单词

3 个答案:

答案 0 :(得分:4)

您可以使用交叉连接具有所有可能的组合,然后根据条件,可以删除已经存在的组合。

Select * from words a cross join words b 
where not exists (select * from combinations c where c.first_id = a.id and c.second_id = b.id) 

答案 1 :(得分:1)

正如@VahiD所指出的,CROSS JOIN是难题的中心部分。除了使用子查询,您还可以将现有组合表LEFT JOIN替换为单词CROSS JOIN,并检查NULL(这意味着给定的笛卡尔积组合不会存在于您现有的组合表中。

例如:

WITH 
    -- sample data (notice that there's no word with ID of 3)
    words(word_id, word) AS
    (
        SELECT 1, 'apple'   UNION ALL
        SELECT 2, 'pear'    UNION ALL
        SELECT 4, 'orange'  UNION ALL
        SELECT 5, 'banana'
    )
    -- existing combinations
    ,combinations(first_id, second_id) AS
    (
        SELECT 1, 2 UNION ALL
        SELECT 1, 5 UNION ALL
        SELECT 2, 4 UNION ALL
        SELECT 2, 5 UNION ALL
        SELECT 4, 5
    )
    -- this is the CTE you'll use to create the cartesian product
    -- of all words in your words table. You can also put this as a 
    -- sub-query, but I'd argue that a CTE makes it clearer.
    ,cartesian(w1_id, w1_word, w2_id, w2_word) AS
    (
        SELECT *
        FROM words w1, words w2
    )
-- the actual query
SELECT * 
FROM cartesian
    LEFT JOIN combinations ON
        combinations.first_id = cartesian.w1_id
        AND combinations.second_id = cartesian.w2_id
WHERE combinations.first_id IS NULL

现在,一个重要的警告是,在切换word1word2时,此查询不会认为组合是相同的。也就是说,(1,2)(2,1)不同。但是,解决此问题就像调整联接一样简单:

SELECT * 
FROM cartesian
    LEFT JOIN combinations ON
        (combinations.first_id = cartesian.w1_id OR combinations.first_id = cartesian.w2_id)
        AND
        (combinations.second_id = cartesian.w1_id OR combinations.second_id = cartesian.w2_id)
WHERE combinations.first_id IS NULL

答案 2 :(得分:1)

这是另一种选择。在子查询中建立完整列表,然后将其保留在组合表的外部,以查找缺失的内容。

DECLARE @Words TABLE
    (
        [Id] INT
      , [Word] NVARCHAR(200)
    );

DECLARE @WordCombo TABLE
    (
        [Id1] INT
      , [Id2] INT
    );

INSERT INTO @Words (
                       [Id]
                     , [Word]
                   )
VALUES ( 1, N'Cat' )
     , ( 2, N'Taco' )
     , ( 3, N'Test' )
     , ( 4, N'Cake' )
     , ( 5, N'Apple' )
     , ( 6, N'Pear' );

INSERT INTO @WordCombo (
                           [Id1]
                         , [Id2]
                       )
VALUES ( 1, 2 )
     , ( 2, 6 )
     , ( 5, 3 )
     , ( 5, 1 );

--select from a sub query that builds out all combinations and then left outer to find what's missing in @WordCombo
SELECT          [fulllist].[Id1]
              , [fulllist].[Id2]
FROM            (
                    --Rebuild full list
                    SELECT     [a].[Id] AS [Id1]
                             , [b].[Id] AS [Id2]
                    FROM       @Words [a]
                    INNER JOIN @Words [b]
                        ON 1 = 1
                    WHERE      [a].[Id] <> [b].[Id] --Would a word be combined with itself?

                ) AS [fulllist]
LEFT OUTER JOIN @WordCombo [wc]
    ON [wc].[Id1] = [fulllist].[Id1]
       AND [wc].[Id2] = [fulllist].[Id2]
WHERE           [wc].[Id1] IS NULL;