我需要从包含类似结构化数据的两个表中选择匹配对。 “匹配对”在这里表示在“匹配”列中相互引用的两行。
单表匹配对示例:
TABLE
----
id | matchid
1 | 2
2 | 1
ID 1和2是匹配对,因为每个匹配对都有匹配条目。
现在真正的问题是:选择两个表中出现的匹配对的最佳(最快)方法是什么:
Table ONE (id, matchid)
Table TWO (id, matchid)
示例数据:
ONE TWO
---- ----
id | matchid id | matchid
1 | 2 2 | 3
2 | 3 3 | 2
3 | 2
4 | 5
5 | 4
所需的结果是ID为2和3的单行。
RESULT
----
id | id
2 | 3
这是因为2& 3是表ONE和表TWO中的匹配对。 4& 5是表ONE中的匹配对但不是TWO,所以我们不选择它们。 1和2根本不是匹配对,因为2没有1的匹配条目。
我可以通过以下方式从一个表中获取匹配的对:
SELECT a.id, b.id
FROM ONE a JOIN ONE b
ON a.id = b.matchid AND a.matchid = b.id
WHERE a.id < b.id
我应该如何构建一个只选择两个表中出现的匹配对的查询?
我应该:
(由于这是一个效率问题,值得注意的是匹配将非常稀疏,可能是1/1000或更少,每个表将有100,000多行。)
答案 0 :(得分:1)
我想我明白你的观点。您想要过滤两个表中存在的对的记录。
SELECT LEAST(a.ID, a.MatchID) ID, GREATEST(a.ID, a.MatchID) MatchID
FROM One a
INNER JOIN Two b
ON a.ID = b.ID AND
a.matchID = b.matchID
GROUP BY LEAST(a.ID, a.MatchID), GREATEST(a.ID, a.MatchID)
HAVING COUNT(*) > 1
答案 1 :(得分:0)
尝试此查询:
select
O.id,
O.matchid
from
ONE O
where
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
已编辑查询:
select distinct
Least(O.id,O.matchid) ID,
Greatest(O.id,O.matchid) MatchID
from
ONE O
where
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
and (CAST(O.matchid as CHAR(50))+'~'+CAST(O.id as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
<强> SQL Fiddle 强>
答案 2 :(得分:0)
Naive版本,用于检查所有需要存在的所有四行:
-- EXPLAIN ANALYZE
WITH both_one AS (
SELECT o.id, o.matchid
FROM one o
WHERE o.id < o.matchid
AND EXISTS ( SELECT * FROM one x WHERE x.id = o.matchid AND x.matchid = o.id)
)
, both_two AS (
SELECT t.id, t.matchid
FROM two t
WHERE t.id < t.matchid
AND EXISTS ( SELECT * FROM two x WHERE x.id = t.matchid AND x.matchid = t.id)
)
SELECT *
FROM both_one oo
WHERE EXISTS (
SELECT *
FROM both_two tt
WHERE tt.id = oo.id AND tt.matchid = oo.matchid
);
这个更简单:
-- EXPLAIN ANALYZE
WITH pair AS (
SELECT o.id, o.matchid
FROM one o
WHERE EXISTS ( SELECT * FROM two x WHERE x.id = o.id AND x.matchid = o.matchid)
)
SELECT *
FROM pair pp
WHERE EXISTS (
SELECT *
FROM pair xx
WHERE xx.id = pp.matchid AND xx.matchid = pp.id
)
AND pp.id < pp.matchid
;