仅在数据集尚未存在时才插入INTO

时间:2016-12-06 13:59:31

标签: sql sql-server insert duplicates

我的程序中有这个脚本。它会将user_settings中每个条目的新数据集插入到我的pricelist表中。它工作正常。

INSERT INTO [pricelist]
       ([ID]
       ,[plf]
       ,[vdf]
       ,[vdt]
       ,[plID]
       ,[usID])
 SELECT
       NEWID()
       ,5.5
       ,'2017-01-02 00:00:00'
       ,'2027-01-03 00:00:00'
       ,'8020F2FA1C80XXXXXXXXXXXXXXX'
       ,ID
FROM [user_settings]

但是如果我需要重新运行脚本,以防运行1因任何原因而被取消。我在表pricelist中得到重复的条目。我怎么能避免这个?提前谢谢。

3 个答案:

答案 0 :(得分:1)

一种简单的方法是使用NOT EXISTS

INSERT INTO [pricelist]
       ([ID]
       ,[plf]
       ,[vdf]
       ,[vdt]
       ,[plID]
       ,[usID])
 SELECT
       NEWID()
       ,5.5
       ,'2017-01-02 00:00:00'
       ,'2027-01-03 00:00:00'
       ,'8020F2FA1C80XXXXXXXXXXXXXXX'
       ,ID
FROM [user_settings] AS us
WHERE NOT EXISTS (SELECT 1 FROM [pricelist] AS p WHERE p.UsID = us.ID);

虽然这不是线程安全的,但如果遇到竞争条件,您仍可能最终得到重复项。如果某些东西应该是唯一的,那么给它一个独特的约束,例如

ALTER TABLE PriceList ADD CONSTRAINT UQ_PriceList__usID UNIQUE (usID);

这将保证不重复。我所知道的最线程安全INSERT是使用MERGE WITH (HOLDLOCK)

MERGE [PriceList] WITH (HOLDLOCK) AS p
USING [user_settings]  AS US
    ON us.ID = p.usID
WHEN NOT MATCHED THEN 
    INSERT ([ID], [plf], [vdf], [vdt], [plID], [usID])
    VALUES (NEWID(), 5.5, '2017-01-02 00:00:00', '2027-01-03 00:00:00', 
            '8020F2FA1C80XXXXXXXXXXXXXXX', us.[ID]);

这仍然无法替代创建约束。

N.B。从您的问题来看,如何定义副本并不是很清楚,为简洁起见,我假设它只是表中的usID,但如果它是更多列,则上述所有原则仍然适用

<强>附录

如果以上操作不起作用,那么您可能需要为联接添加更多条件以包含更多列:

MERGE PriceList WITH (HOLDLOCK) AS p
USING 
(   SELECT  ID = NEWID(), 
            plf = 5.5, 
            vdf = '2017-01-02 00:00:00', 
            vdt = '2027-01-03 00:00:00',
            plID = '8020F2FA1C80XXXXXXXXXXXXXXX',
            usID = us.ID
    FROM    user_settings  AS us
    GROUP BY us.ID -- only needed if us.ID is not unique
) AS us
    ON us.usID = p.usID
    AND us.plf = p.plf
    AND us.vdf = p.vdf
    AND us.vdt = p.vdt
    AND us.plID = p.plID
WHEN NOT MATCHED THEN 
    INSERT (ID, plf, vdf, vdt, plID, usID)
    VALUES (us.ID, us.plf, us.vdf, us.vdt, us.plID, us.usID);

最后,部分问题可能是现有的重复项,您可以使用以下内容删除:

DELETE  t
FROM    (   SELECT  *, RowNumber = ROW_NUMBER() OVER(PARTITION BY plf, vdf, vdt, plID, usID ORDER BY ID)
            FROM    PriceList
        ) AS t
WHERE   t.RowNumber > 1;

答案 1 :(得分:1)

您应该考虑在桌面上添加unique constraint以避免此类问题。

使用NOT Exists以避免插入重复数据。根据您的评论,您需要检查除ID

之外的所有列
INSERT INTO [pricelist]
            ([ID],[plf],[vdf],[vdt],[plID],[usID])
SELECT [ID],[plf],[vdf],[vdt],[plID],[usID]
FROM   (SELECT Newid()                       ID,
               5.5                           AS plf,
               '2017-01-02 00:00:00'         AS vdf,
               '2027-01-03 00:00:00'         AS vdt,
               '8020F2FA1C80XXXXXXXXXXXXXXX' AS plID,
               ID                            AS usID
        FROM   [user_settings]) u
WHERE  NOT EXISTS (SELECT 1
                   FROM   pricelist p
                   WHERE  u.plf = p.plf
                          AND u.vdf = p.vdf
                          AND u.vdt = p.vdt
                          AND u.plID = p.plID
                          AND u.usID = p.usID) 

答案 2 :(得分:0)

 INSERT INTO [pricelist]
        SELECT NEWID(), u.*
        FROM [user_settings] u
        LEFT JOIN [pricelist] p
          ON u.[id] = p.[id]
        WHERE p.[id] IS NULL