我的某个表中的SELECT DISTINCT
值存在很大问题。
表1:T1
pid box cassette seal added (timestamp)
---------------------------------------------------------------
1 A1212 A01A00001 P123456 2015-01-01 12:00:01
2 A1212 A01A00001 P123457 2015-01-01 12:00:01
3 A1214 A01A00004 C123458 2015-01-01 12:00:01
4 A1214 A01B00005 D123459 2015-01-01 12:00:01
5 A1214 A01B00006 D123460 2015-01-01 12:00:01
6 A1212 A01B00007 E123461 2015-01-01 12:00:01
7 A1212 A01B00007 E123462 2015-01-01 12:00:01
表2:T2
id t1_pid box cassette seal error despatched
------------------------------------------------------------------------
1 3 A1214 A01A00004 C123458 false true
2 7 A1212 A01B00007 E123462 true false
我需要SELECT
所有DISTINCT
所有Table T1
来自Table T2
的每个boxm,盒式磁带,以及:
1.不在seal
,最重要的是 - 只有Table T2
pid是最高/最后添加的
要么
2.位于T2.error=true
但T2.despached=false
或T1
结果应保留pid
条记录
1 - 因为记录2具有相同的框,但是记录1具有较低的t2.t1_pid=3
3 - 因为有despatched=TRUE
但pid
6 - 因为记录7具有相同的框,但是记录6具有较低的t2.t1_pid=7
记录7应该是结果,因为error=TRUE
但id box cassette seal
-------------------------------------------
2 A1212 A01A00001 P123457 /(rec. no 2)
4 A1214 A01B00005 D123459 /(rec. no 4)
5 A1214 A01B00006 D123460 /(rec. no 5)
6 A1212 A01B00007 E123462 /(rec. no 6)
结果表:
if t1.pid
我已经尝试了以下语法,如果密封数量更高,则可以。我需要更改SELECT DISTINCT T1.pid, T1.box, T1.cassette, T1.seal
FROM T1 INNER JOIN
(SELECT T1.box, T1.cassette, max(T1.seal) as seal FROM
T1 LEFT OUTER JOIN T2 o ON T1.pid=o.t1_pid WHERE
(o.id IS NULL or (o.despatched=0 ))
GROUP BY T1.cassette, T1.box)
as b using (cassette, box, seal)
的条件更高,但无法弄明白。
{{1}}
非常感谢您的帮助和宝贵的时间
答案 0 :(得分:1)
此任务与DISTINCT无关,因为我们不是在谈论重复记录,我们必须消除。这相当于聚合(即将结果煮沸为独特的盒子/盒子数据)。
您为T1记录命名了两个条件:
条件1:
where t1.pid not in (select t1_pid from t2)
and not exists
(
select *
from t1 as later
where later.box = t1.box
and later.cassette = t1.cassette
and later.pid > t1.pid
)
条件2:
where t1.pid in
(
select t1_pid
from t2
where t2.error = true
or t2.despached = false
)
但这还不够,因为我们仍然可以获得一个盒子和盒子的多个记录(一个匹配条件1,一个条件2或多个匹配条件2)。在您的评论中,您添加了第三个条件:
也许你存储你的数据,这样盒子/盒子总是只能在两个条件下匹配一次,但技术上至少可以得到重复,所以我们应该找到一种方法处理它。最简单的方法是按盒子和盒子进行分组,以确保每个盒子和盒子只能获得一个结果记录。然后用它显示最小或最大匹配密封。
select box, casette, max(seal)
from t1
where
(
t1.pid not in (select t1_pid from t2)
and not exists
(
select *
from t1 as later
where later.box = t1.box
and later.cassette = t1.cassette
and later.pid > t1.pid
)
)
or t1.pid in
(
select t1_pid
from t2
where t2.error = true
or t2.despached = false
)
group by box, casette;
我没有在结果中显示ID,因为我不知道你是如何得到它们的。你说它们是T1 ID,但它不是所选记录的ID,例如你选择了记录2(pid 2),但是在你的结果中你出现了ID为1的记录,原因我不明白。
答案 1 :(得分:0)
我已经创建了以下语法 - 我觉得很好但是如果它针对表T1
,T2
优化了1个bilion记录,我真的不会这样做
SELECT abc.pid, abc.box, abc.cassette, abc.seal FROM
(
SELECT tt.pid, tt.box, tt.cassette, tt.seal
FROM t1 tt
INNER JOIN
(
SELECT box, MAX(pid) AS pid
FROM t1
WHERE added>DATE(now() - INTERVAL 2 DAY) /* for example */
GROUP BY box, cassette
) groupedtt
ON tt.box = groupedtt.box
AND tt.pid = groupedtt.pid
ORDER BY box,cassette
) abc
LEFT OUTER JOIN t2 o ON abc.pid=o.pid
WHERE
(
o.pid IS NULL
or (
o.despatched=0 AND
o.added>DATE(NOW() - INTERVAL 2 DAY) /* for example */
)
)