我以简化和扩展的方式重新询问question。
考虑这些sql语句:
create table foo (id INT, score INT);
insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
使用sqlite,select
语句返回:
id avg1
---------- ----------
106 4.5
107 4.0
和mysql返回:
+------+--------+
| id | avg1 |
+------+--------+
| 106 | 4.5000 |
+------+--------+
据我所知,mysql的结果是正确的,sqlite是不正确的。我尝试使用sqlite转换为real
,如下所示,但它仍然返回两个记录:
select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
from foo T2
group by T2.id
having avg2 > avg1);
为什么sqlite会返回两条记录?
快速更新:
我针对最新的sqlite版本(3.7.11)运行了该语句,仍然获得了两条记录。
另一次更新:
我发送了一封关于此问题的电子邮件至sqlite-users@sqlite.org。
我自己,我一直在玩VDBE,发现了一些有趣的东西。我将每个not exists
循环的执行跟踪分开(每个平均组一个)。
为了拥有三个平均组,我使用了以下语句:
create table foo (id VARCHAR(1), score INT);
insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);
PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;
select avg(score) avg1
from foo
group by id
having not exists (
select avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
我们清楚地看到r:4.5
应该变成i:5
:
我现在正试图了解原因。
最终修改:
所以我一直在玩sqlite源代码。我现在对这种野兽的理解要好得多,虽然我会让original developer对它进行排序,因为他似乎已经在做了这件事:
http://www.sqlite.org/src/info/430bb59d79
有趣的是,至少对我来说,似乎新版本(在我使用的版本之后的某些时候)支持插入在上述提交中添加的测试用例中使用的多个记录:
CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);
答案 0 :(得分:1)
我试图弄乱一些查询变种。
似乎sqlite在嵌套 HAVING 表达式中使用先前声明的字段时出错。
在你的例子avg1
下,第二个总是等于5.0
查找
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);
这个没有返回任何内容,但执行以下查询将返回两个记录:
...
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);
我在sqlite tickets list找不到任何类似的错误。
答案 1 :(得分:1)
让我们看看这两种方式,我将使用postgres 9.0作为我的参考数据库
(1)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for
-- any row in T1
having avg2 > avg1);
id | avg1
-----+--------------------
106 | 4.5000000000000000
(1 row)
然后让我们移动子查询中的一些逻辑,摆脱'not': (2)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join
-- with the 'having' only we don't display the cartesian row set
id | avg1
-----+--------------------
106 | 4.5000000000000000
107 | 4.0000000000000000
(2 rows)
所以你必须问问自己 - 当你在having子句中执行这个相关的子查询时,你实际上是什么意思,如果它根据主要查询的每一行评估每一行,我们正在进行笛卡尔加入,我不知道我们认为我们应该指责SQL引擎。
如果你想要每一行都小于最大平均值你应该说的是:
select T1.id, avg(T1.score) avg1
from foo T1 group by T1.id
having avg1 not in
(select max(avg1) from (select id,avg(score) avg1 from foo group by id))
答案 2 :(得分:0)
你试过这个版本吗? :
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg(T1.score));
这一个(应该给出相同的结果):
select T1.*
from
( select id, avg(score) avg1
from foo
group by id
) T1
where not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg1);
也可以使用派生表处理查询,而不是HAVING
子句中的子查询:
select ta.id, ta.avg1
from
( select id, avg(score) avg1
from foo
group by id
) ta
JOIN
( select avg(score) avg1
from foo
group by id
order by avg1 DESC
LIMIT 1
) tmp
ON tmp.avg1 = ta.avg1