我有两个包含这些列的表,每个表的行数大约为1000万。
Table A
Id | Name | Salary
1 | TEST1 | 100
2 | TEST2 | 200
3 | TEST3 | 300
Table B
Id | Name | Salary
1 | TEST1 | 100
2 | TEST2 | 200
4 | TEST4 | 400
我想要做的是从两个表中删除类似的数据并进行比较以检查基于主键的数据之间的差异。假设id
是我的主键。
我使用了以下查询。它消除了相同的数据,使用主键连接剩余数据,并在两个部分中显示两个表中的数据,以便我可以使用一些文本比较工具检查差异,将结果输入csv或excel。
SELECT T1.*
FROM (SELECT * FROM TABLE1 EXCEPT SELECT * FROM TABLE2) T1
JOIN (SELECT * FROM TABLE2 EXCEPT SELECT * FROM TABLE1 ) T2 ON T1.id = T2.id
SELECT T2.*
FROM (SELECT * FROM TABLE1 EXCEPT SELECT * FROM TABLE2) T1
JOIN (SELECT * FROM TABLE2 EXCEPT SELECT * FROM TABLE1 ) T2 ON T1.id = T2.id
它适用于小型数据集,但是对于包含以下错误消息的大型数据集会失败:
数据库'tempdb'的事务日志已满。要找出无法重用日志中的空间的原因,请参阅sys.databases中的log_reuse_wait_desc列
我想知道
答案 0 :(得分:2)
试试这个...
SELECT T1.*
INTO #tmp1
FROM Table1 T1
LEFT JOIN Table2 T2 ON T1.id = T2.id
WHERE T2.id IS NULL
SELECT T2.*
INTO #tmp2
FROM Table2 T2
LEFT JOIN Table1 T1 ON T2.id = T1.id
WHERE T1.id IS NULL
SELECT *
FROM #tmp1
UNION ALL
SELECT *
FROM #tmp2
DROP TABLE #tmp1
DROP TABLE #tmp2
它的工作......你应该试试这个......
答案 1 :(得分:1)
观察/执行以下脚本。它产生4个结果集,前两个相等,后两个也相等。我检查了重写查询的执行计划,它们更简单。他们很可能不会使用tempdb。
在INDEX
列上添加id
也是明智之举。这肯定会加快速度。
CREATE TABLE #T1(id INT NOT NULL PRIMARY KEY, name NVARCHAR(256), salary NUMERIC(28,2));
CREATE TABLE #T2(id INT NOT NULL PRIMARY KEY, name NVARCHAR(256), salary NUMERIC(28,2));
INSERT INTO #T1(id,name,salary)VALUES(1,N'TT',25000),(2,N'Michael',25000),(3,N'Zara',30000),(4,N'Pol',60000),(7,N'Brad',25000);
INSERT INTO #T2(id,name,salary)VALUES(1,N'TT',25000),(2,N'Templeton',25000),(3,N'Zara',25000),(4,N'Jack',60000),(5,N'Pippa',25000);
SELECT
T1.*
FROM
#T1 AS T1
INNER JOIN #T2 AS T2 ON
T2.id=T1.id
WHERE
T2.name<>T1.name OR
T2.salary<>T1.salary;
SELECT TT1.*
FROM (SELECT*FROM #T1 EXCEPT SELECT*FROM #T2) AS TT1
JOIN (SELECT*FROM #T2 EXCEPT SELECT*FROM #T1) AS TT2 ON TT2.id=TT1.id;
SELECT
T2.*
FROM
#T2 AS T2
INNER JOIN #T1 AS T1 ON T1.id=T2.id
WHERE
T1.name<>T2.name OR
T1.salary<>T2.salary;
SELECT TT2.*
FROM (SELECT*FROM #T1 EXCEPT SELECT*FROM #T2) AS TT1
JOIN (SELECT*FROM #T2 EXCEPT SELECT*FROM #T1) AS TT2 ON TT2.id=TT1.id;
DROP TABLE #T1;
DROP TABLE #T2;