我的下表有3列:Id,FeatureName和Value:
Id FeatureName Value
-- ----------- -----
1 AAA 10
1 ABB 12
1 BBB 12
2 AAA 15
2 ABB 12
2 ACD 7
3 AAA 10
3 ABB 12
3 CCC 12
.............
每个ID都有不同的功能,每个功能都有该Id的值。
我需要编写一个查询,它给出了具有与给定特征和值完全相同的特征和值的ID,但只考虑了名称以“A”开头的那些ID。例如,在顶部表格中,我可以使用该查询来搜索具有相同功能的所有ID。例如,具有Id = 1的值的特征将导致Id = 3,其具有以“A”开头的相同特征以及这些特征的相同值。
我找到了几种不同的方法来做到这一点,但是当表有很多行(超过几十万)时,所有这些方法都非常慢
我获得最佳性能的方式是使用下一个查询:
select a2.Id
from (select a.FeatureName, a.Value
from Table1 a
where a.Id = 1) a1,
(select a.Id, a.FeatureName, a.Value
from Table1 a
where a.FeatureName like 'A%') a2
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2
intersect
select a.Id
from Table1 a
where a.FeatureName like 'A%'
group by a.Id
having count(*)= 2
其中@nFeatures是在Id = 1中以'A'开头的要素数。我在调用此查询之前计算了它们。我建立交叉点以避免具有与Id = 1相同的参数的结果,以及其他名称以“A”开头的其他结果。
我认为最慢的部分是第二个子查询:
select a.Id, a.FeaureName, a.Value
from MyTable a
where a.FeatureName = 'A%'
但我不知道如何让它更快。也许我将不得不玩索引。
我知道如何为此目的编写快速查询?
答案 0 :(得分:1)
因此,您希望FeatureName
和Value
组合的所有行都不是唯一的吗?您可以使用EXISTS
:
SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
WHERE t.Id <> t2.ID
AND t.FeatureName = t2.FeatureName
AND t.Value = t2.Value)
我怎么能为此目的写一个快速查询?
如果速度不够快,请在FeatureName + Value上创建一个索引。
答案 1 :(得分:0)
我尝试再次删除与MyTable的连接,以选择具有匹配的FeatureName和Value值的ID的数据。这是查询:
with joined_set as
(
SELECT
mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
from
(
select *
from mytable
where featurename like 'A%'
) mt1
left join
(
select *
from mytable
where featurename like 'A%'
) mt2
on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in
(select id
from joined_set
group by id
having SUM(
CASE
WHEN mt2_id is null THEN 1
ELSE 0
END
) <> 0
);
这是 SQL Fiddle 演示。它在内联视图mt2中有一个额外的条件,只能为id = 1执行此搜索。
答案 2 :(得分:0)
我今天早上有点密集,我不确定你是否只想要ID或...... 这是我对它的看法...... 您可以将FeatureName移动到&#39; A%&#39;进入内部查询以过滤初始表扫描中的数据。
with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'
order by FeatureName, Value, Id
答案 3 :(得分:0)
一般解决方案是
With Rows As (
select id
, FeatureName
, Value
, rows = Count(id) OVER (PARTITION BY id)
FROM test
WHERE FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM Rows a
INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName
and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id
将解决方案限制为组只需在a.ID的主查询上添加WHERE条件。需要CTE才能为每个id获取正确的行数
SQLFiddle演示,在演示中,我几乎没有更改测试数据,只有一个ID只有一个FeatureName为1和3