我提供了以下查询以查找2组数据记录中的常见记录,但由于我的数据库中有大量数据记录,因此我很难确定查询的正确性。
在“客户”和“客户”之间实施Intersect
是否可行使用UNION ALL
和GROUP BY
的“员工”表格如下所示?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
因此,在“客户与员工”表 AND 之间的Intersect
中设置此查询结果中的任何记录存在是正确的在“客户与员工”表之间设置的Intersect
中存在的任何记录在此查询的结果中也是吗?
答案 0 :(得分:2)
所以说这个查询的结果中的任何记录都是正确的 “Intersect”设置在“Customers& Employees”“AND”之间 存在于“Customers& Employees”之间的“Intersect”集合中 这个查询呢?
是。
...是的,但它不会那么高效,因为你要过滤三次而不是一次重复。在您的查询中,您是
使用INTERSECT将返回相同的结果,但效率更高。要自己查看,您可以在下面创建示例数据并运行两个查询:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
运行以下查询:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
结果:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
如果使用INTERSECT不是一个选项或者您想要更快的查询,您可以改进您发布的查询,但有几种不同的方式,例如:
选项1:让GROUP BY像这样处理所有重复数据删除:
这与您发布但没有DISTINCTS的内容相同
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
选项2:使用ROW_NUMBER
这将是我的偏好,并且可能是最有效的
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;