选择具有与其他功能相同的功能的行

时间:2014-05-06 12:13:25

标签: sql performance sql-server-2008 group-by

我的下表有3列:Id,FeatureName和Value:

Id  FeatureName  Value
--  -----------  -----
1   AAA          10
1   ABB          12
1   BBB          12
2   AAA          15
2   ABB          12
2   ACD           7
3   AAA          10
3   ABB          12
3   CCC          12
.............

每个ID都有不同的功能,每个功能都有该Id的值。

我需要编写一个查询,它给出了具有与给定特征和值完全相同的特征和值的ID,但只考虑了名称以“A”开头的那些ID。例如,在顶部表格中,我可以使用该查询来搜索具有相同功能的所有ID。例如,具有Id = 1的值的特征将导致Id = 3,其具有以“A”开头的相同特征以及这些特征的相同值。

我找到了几种不同的方法来做到这一点,但是当表有很多行(超过几十万)时,所有这些方法都非常慢

我获得最佳性能的方式是使用下一个查询:

select a2.Id 
from (select a.FeatureName, a.Value
      from Table1 a
      where a.Id = 1) a1,      
     (select a.Id, a.FeatureName, a.Value        
      from Table1 a  
      where  a.FeatureName like 'A%') a2 
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2

intersect

select a.Id
from Table1 a 
where a.FeatureName like 'A%'  
group by a.Id 
having count(*)= 2

其中@nFeatures是在Id = 1中以'A'开头的要素数。我在调用此查询之前计算了它们。我建立交叉点以避免具有与Id = 1相同的参数的结果,以及其他名称以“A”开头的其他结果。

我认为最慢的部分是第二个子查询:

select a.Id, a.FeaureName, a.Value        
from MyTable a  
where  a.FeatureName = 'A%'

但我不知道如何让它更快。也许我将不得不玩索引。

我知道如何为此目的编写快速查询?

4 个答案:

答案 0 :(得分:1)

因此,您希望FeatureNameValue组合的所有行都不是唯一的吗?您可以使用EXISTS

SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
           WHERE t.Id <> t2.ID
           AND   t.FeatureName = t2.FeatureName
           AND   t.Value       = t2.Value)

Demo

  

我怎么能为此目的写一个快速查询?

如果速度不够快,请在FeatureName + Value上创建一个索引。

答案 1 :(得分:0)

我尝试再次删除与MyTable的连接,以选择具有匹配的FeatureName和Value值的ID的数据。这是查询:

with joined_set as
(
    SELECT
    mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
    from
    (
        select *
        from mytable
        where featurename like 'A%'
    ) mt1
    left join
    (
        select *
        from mytable
        where featurename like 'A%'
    ) mt2
    on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in 
    (select id 
        from joined_set 
        group by id 
        having SUM(
                CASE 
                    WHEN mt2_id is null THEN 1
                    ELSE 0
                END
                ) <> 0
    );

这是 SQL Fiddle 演示。它在内联视图mt2中有一个额外的条件,只能为id = 1执行此搜索。

答案 2 :(得分:0)

我今天早上有点密集,我不确定你是否只想要ID或...... 这是我对它的看法...... 您可以将FeatureName移动到&#39; A%&#39;进入内部查询以过滤初始表扫描中的数据。

with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
                 MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'                 
order by FeatureName, Value, Id                 

答案 3 :(得分:0)

一般解决方案是

With Rows As (
select id
     , FeatureName
     , Value
     , rows = Count(id) OVER (PARTITION BY id)
FROM   test
WHERE  FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM   Rows a
       INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName 
             and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id

将解决方案限制为组只需在a.ID的主查询上添加WHERE条件。需要CTE才能为每个id获取正确的行数

SQLFiddle演示,在演示中,我几乎没有更改测试数据,只有一个ID只有一个FeatureName为1和3