我有一个表(TestFI),其中包含以下数据
FIID Email
---------
null a@a.com
1 a@a.com
null b@b.com
2 b@b.com
3 c@c.com
4 c@c.com
5 c@c.com
null d@d.com
null d@d.com
我需要两次出现的记录,并且FIID为1行,其中一行不为。对于上述数据,只有“a@a.com和b@b.com”符合要求。
我能够像这样构建一个多级查询
Select
FIID,
Email
from
TestFI
where
Email in
(
Select
Email
from
(
Select
Email
from
TestFI
where
Email in
(
select
Email
from
TestFI
where
FIID is null or FIID is not null
group by Email
having
count(Email) = 2
)
and
FIID is null
)as Temp1
group by Email
having count(Email) = 1
)
然而,花了近10分钟才能完成1000万条记录。有一个更好的方法吗?我知道我必须在这里做一些愚蠢的事情。
由于
答案 0 :(得分:7)
我会尝试这个查询:
SELECT EMail, MAX(FFID)
FROM TestFI
GROUP BY EMail
HAVING COUNT(*)=2 AND COUNT(FIID)=1
它将返回EMail列,以及FFID的非null值。 FFID的另一个值为null。
答案 1 :(得分:1)
在(email, fid)
上有一个索引,我很想尝试:
select tnull.*, tnotnull.*
from testfi tnull join
testfi tnotnull
on tnull.email = tnotnull.email left outer join
testfi tnothing
on tnull.email = tnothing.email
where tnothing.email is null and
tnull.fid is null and
tnotnull.fid is not null;
性能肯定取决于数据库。这将保留索引中的所有访问。在某些数据库中,聚合可能更快。性能还取决于查询的选择性。例如,如果有一个NULL记录并且您有索引(fid, email)
,那么这应该比聚合快得多。
答案 2 :(得分:0)
也许像......
select
a.FIID,
a.Email
from
TestFI a
inner join TestFI b on (a.Email=b.Email)
where
a.FIID is not null
and b.FIID is null
;
并确保将电子邮件和FIID编入索引。
答案 3 :(得分:0)
I need records that appear exactly twice AND have 1 row with FIID is null and one is not
在最里面的选择中,通过电子邮件分组count = 2:
select email, coalesce(fiid,-1) as AdjusteFIID from T
group by email having count(email) =2
select email, AdjustedFIID
from
(
select email, coalesce(fiid,-1) as AdjusteFIID from T
group by email having count(email) =2
) as X
group by email
having min(adjustedFIID) = -1 and max(adjustedFIID) > -1