通过“UNION ALL”和“GROUP BY”实现“相交”

时间:2017-10-16 13:34:30

标签: tsql intersect

我提供了以下查询以查找2组数据记录中的常见记录,但由于我的数据库中有大量数据记录,因此我很难确定查询的正确性。

在“客户”和“客户”之间实施Intersect是否可行使用UNION ALLGROUP BY的“员工”表格如下所示?

SELECT D.Country, D.Region, D.City
  FROM (SELECT DISTINCT Country, Region, City 
          FROM Customers
         UNION ALL
        SELECT DISTINCT Country, Region, City
          FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;

因此,在“客户与员工”表 AND 之间的Intersect中设置此查询结果中的任何记录存在是正确的在“客户与员工”表之间设置的Intersect中存在的任何记录在此查询的结果中也是吗?

1 个答案:

答案 0 :(得分:2)

  

所以说这个查询的结果中的任何记录都是正确的   “Intersect”设置在“Customers& Employees”“AND”之间   存在于“Customers& Employees”之间的“Intersect”集合中   这个查询呢?

是。

...是的,但它不会那么高效,因为你要过滤三次而不是一次重复。在您的查询中,您是

  1. 使用DISTINCT从员工中提取唯一记录
  2. 使用DISTINCT从客户中提取唯一记录
  3. 使用UNION ALL
  4. 组合两个查询
  5. 在外部查询中使用GROUP BY来过滤在步骤1,2和3中检索到的记录。
  6. 使用INTERSECT将返回相同的结果,但效率更高。要自己查看,您可以在下面创建示例数据并运行两个查询:

    use tempdb
    go
    if object_id('dbo.customers') is not null drop table dbo.customers;
    if object_id('dbo.employees') is not null drop table dbo.employees;
    
    create table dbo.customers
    (
      customerId int identity,
      country    varchar(50),
      region     varchar(50),
      city       varchar(100)
    );
    
    create table dbo.employees
    (
      employeeId int identity,
      country    varchar(50),
      region     varchar(50),
      city       varchar(100)
    );
    
    insert dbo.customers(country, region, city) 
    values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
    insert dbo.employees
    values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
    

    运行以下查询:

    SELECT D.Country, D.Region, D.City
    FROM 
    (
      SELECT DISTINCT Country, Region, City 
      FROM Customers
      UNION ALL
      SELECT DISTINCT Country, Region, City
      FROM Employees
    ) AS D
    GROUP BY D.Country, D.Region, D.City
    HAVING COUNT(*) = 2;
    
    SELECT Country, Region, City
    FROM dbo.customers
    INTERSECT
    SELECT Country, Region, City
    FROM dbo.employees;
    

    结果:

    Country     Region     City
    ----------- ---------- ----------
    us          Midwest    Chicago
    
    Country     Region     City
    ----------- ---------- ----------
    us          Midwest    Chicago
    

    如果使用INTERSECT不是一个选项或者您想要更快的查询,您可以改进您发布的查询,但有几种不同的方式,例如:

    选项1:让GROUP BY像这样处理所有重复数据删除:

    这与您发布但没有DISTINCTS的内容相同

    SELECT D.Country, D.Region, D.City
    FROM 
    (
      SELECT Country, Region, City 
      FROM Customers
      UNION ALL
      SELECT Country, Region, City
      FROM Employees
    ) AS D
    GROUP BY D.Country, D.Region, D.City
    HAVING COUNT(*) = 2;
    

    选项2:使用ROW_NUMBER

    这将是我的偏好,并且可能是最有效的

    SELECT Country, Region, City
    FROM 
    (
      SELECT
        rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)), 
        D.Country, D.Region, D.City
      FROM 
      (
        SELECT Country, Region, City 
        FROM Customers
        UNION ALL
        SELECT Country, Region, City
        FROM Employees
      ) AS D
    ) uniquify
    WHERE rn = 2;