我目前正在使用SQL Server升级数据库。目前我正在尝试清理一个表,以摆脱一大堆重复记录。但是,我似乎无法让我的查询正常工作。
CREATE TABLE Temp_A
(
Order_ID INT NOT NULL,
Job_Number VARCHAR(20) NOT NULL,
Supplier_Name VARCHAR(50) NOT NULL
);
BULK INSERT Temp_A
FROM 'This\is\the\file\path.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
CREATE TABLE Temp_B
(
Order_ID INT NOT NULL,
Job_Number VARCHAR(20) NOT NULL,
Supplier_Name VARCHAR(50) NOT NULL
CONSTRAINT Temp_Con UNIQUE (Order_ID, Job_Number)
);
INSERT INTO Temp_B
SELECT Order_ID, Job_Number, Supplier_Name
FROM Temp_A AS A
WHERE NOT EXISTS (SELECT 1
FROM Temp_B AS B
WHERE B.Order_ID = A.Order_ID
AND B.Job_Number = A.Job_Number)
我的代码中无效的部分是最后的INSERT INTO Temp_B
块。我正在做的是将CSV文件中的数据插入Temp_A
表,然后尝试抓取所有唯一的Order_ID & Part_Number
对,并将它们存储在Temp_B
表中。
我喜欢进去并手动删除这些副本但是有成千上万的记录所以......是的,这将永远需要。我不知道从哪里开始。
编辑:要添加我收到的错误消息:
违反UNIQUE KEY约束'Temp_Con'。无法在对象'dbo.Temp_B'中插入重复键。重复键值为(3,L154)
答案 0 :(得分:2)
您有两列唯一的列,但您的源数据有3.如果您有多个行具有相同的Order_ID
和Job_Number
,您会选择哪一行?
将GROUP BY
与MAX()
一起使用。
INSERT INTO Temp_B (
Order_ID,
Job_Number,
Supplier_Name
SELECT
Order_ID,
Job_Number,
Supplier_Name = MAX(Supplier_Name)
FROM
Temp_A AS A
WHERE
NOT EXISTS (
SELECT
'not yet in Temp_B'
FROM
Temp_B AS B
WHERE
B.Order_ID = A.Order_ID AND
B.Job_Number = A.Job_Number)
GROUP BY
A.Order_ID,
A.Job_Number
使用ROW_NUMBER()
。
;WITH MissingRanked AS
(
SELECT
Order_ID,
Job_Number,
Supplier_Name,
Ranking = ROW_NUMBER() OVER (
PARTITION BY
A.Order_ID,
Job_Number
ORDER BY
(SELECT NULL)) -- Your ordering criteria here
FROM
Temp_A AS A
WHERE
NOT EXISTS (
SELECT
'not yet in Temp_B'
FROM
Temp_B AS B
WHERE
B.Order_ID = A.Order_ID AND
B.Job_Number = A.Job_Number)
)
INSERT INTO Temp_B (
Order_ID,
Job_Number,
Supplier_Name
SELECT
Order_ID,
Job_Number,
Supplier_Name
FROM
MissingRanked AS M
WHERE
M.Ranking = 1
答案 1 :(得分:0)
我会尝试使用GROUP来使我的INSERT INTO独一无二,就像这样:
INSERT INTO Temp_B
SELECT Order_ID, Job_Number, Supplier_Name
FROM Temp_A AS A
GROUP BY A.Order_ID, A.Job_Number, A.Supplier_Name
我没有要测试的数据,但我认为这样可行。你的问题有Order_ID & Part_Number
,但写的连接没有,我猜一个类型-o但你明白了。这是我要去的方向。您也可以使用DISTINCT
,但我喜欢GROUP BY
答案 2 :(得分:0)
您的方法不起作用,因为子选择会在插入之前看到记录 - 也就是它看到一个空表。
您需要的是DISTINCT关键字。
INSERT INTO Temp_B
SELECT DISTINCT Order_ID, Job_Number, Supplier_Name
FROM Temp_A
答案 3 :(得分:0)
您可以在INSERT查询中添加DISTINCT关键字:
INSERT INTO Temp_B
SELECT DISTINCT Order_ID, Job_Number, Supplier_Name
FROM Temp_A AS A
WHERE NOT EXISTS (
SELECT 1 FROM Temp_B AS B
WHERE B.Order_ID = A.Order_ID
AND B.Job_Number = A.Job_Number);