如何获得唯一匹配记录(参见示例)

时间:2015-03-23 13:34:44

标签: sql-server

这个例子只是为了说明问题,并不像我正在处理的实际数据,但使用类似于实际数据的任何东西最终会变得非常复杂。

假设我有这两套:

id          name license
----------- ---- -----------
1           Joe  1
2           Eric 1
3           Jane 2
4           Mike 2

id          name     license
----------- -------- -----------
11          Van #1   1
12          Van #2   1
13          Truck #1 2
14          Truck #2 2

我希望为每辆车找到1个驾驶员,有资格驾驶它。 (并且为了示例,许可证对于每种类型的车辆是唯一的,诸如驾驶卡车之类的东西不能使驾驶员有资格驾驶货车)。所以期望的结果看起来像这样:

driver_id   driver_name driver_license vehicle_id  vehicle_name vehicle_license
----------- ----------- -------------- ----------- ------------ ---------------
1           Joe         1              11          Van #1       1
2           Eric        1              12          Van #2       1
3           Jane        2              13          Truck #1     2
4           Mike        2              14          Truck #2     2

我已经能够通过以下查询获得此结果,但似乎它可能会因较大的集合而变慢。是否还有其他(更好)的方法可以获得相同的结果?

select d.id driver_id
       ,d.name driver_name
       ,d.license driver_license
       ,v.id vehicle_id
       ,v.name vehicle_name
       ,v.license vehicle_license
    from (select id
               ,name
               ,license
               ,rank() over (partition by license order by id) rank_driver
            from ( values ( 1, 'Joe', 1), 
                      ( 2, 'Eric', 1), 
                      ( 3, 'Jane', 2), 
                      ( 4, 'Mike', 2) ) driver (id, name, license)) d
   left join (select id
                  ,name
                  ,license
                  ,rank() over (partition by license order by id) rank_vehicle
            from ( values ( 11, 'Van #1', 1) , 
                      ( 12, 'Van #2', 1), 
                      ( 13, 'Truck #1', 2), 
                      ( 14, 'Truck #2', 2) ) vehicle (id, name, license)) v 
on d.license = v.license and d.rank_driver = v.rank_vehicle

2 个答案:

答案 0 :(得分:0)

如果您为表和脚本添加样本数据的DDL脚本,那么当您提出非常好的问题时。如果您遇到性能问题,则需要添加适当的索引。

CREATE NONCLUSTERED INDEX ix_drivers ON drivers (name) INCLUDE (license);
CREATE NONCLUSTERED INDEX ix_vehicles ON vehicles (name) INCLUDE (license);

CREATE TABLE #drivers
(
    id INT, name VARCHAR(100), license int
);

CREATE TABLE #vehicles
(
    id INT, name VARCHAR(100), license int
);

INSERT INTO #drivers 
        ( id, name, license )
VALUES
(1,           'Joe',  1),
(2,           'Eric', 1),
(3,           'Jane', 2),
(4,           'Mike', 2);

INSERT INTO #vehicles
        ( id, name, license )
VALUES
(11,          'Van #1',   1),
(12,          'Van #2',   1),
(13,          'Truck #1', 2),
(14,          'Truck #2', 2)

SELECT a.id, a.name, a.license, b.id, b.name, b.license 
FROM 
(
SELECT id, name, license, ROW_NUMBER() OVER (PARTITION BY license ORDER BY name) AS rownum
FROM #drivers
) a
JOIN
(
SELECT id, name, license, ROW_NUMBER() OVER (PARTITION BY license ORDER BY name) AS rownum
FROM #vehicles
) b 
ON a.license = b.license
AND a.rownum = b.rownum
ORDER BY 1

答案 1 :(得分:0)

你的解决方案很好。我会把它写成:

select d.id driver_id, d.name driver_name, d.license driver_license,
       v.id vehicle_id, v.name vehicle_name, v.license vehicle_license
from (select d.*,
             row_number() over (partition by license order by id) as rank_driver
      from drivers d
     ) d left join
     (select v.*,
             row_number() over (partition by license order by id) as rank_vehicle
      from vehicles v
     ) v 
     on d.license = v.license and d.rank_driver = v.rank_vehicle

如果您担心可扩展性,我会建议使用表格中的索引:drivers(license, id)vehicles(license, id)

尽管避免在外部查询中使用*是一种好习惯,但对于子查询来说却是过度的 - 除非您生成一个预编译的表单可能会持续很长时间的预处理语句或视图。数据库本身将优化查询以仅选择所需的列。 (在实现子查询的MySQL中不是这样,但这是另一回事。)