条件聚集效率

时间:2019-07-08 08:46:04

标签: sql h2 conditional-aggregation

让我们有两个表。

A(id int primary key, groupby int, fkb int, search int, padding varchar(1000))
B(id int primary key, groupby int, search int)

它们是使用以下脚本创建的。第一个表很大(一百万行),第二个表很小(一万行)。

CREATE  TABLE A(
  id int not null primary key, 
  groupby int null, 
  fkb int null, 
  search int null,
  padding varchar(1000) null
)  AS
WITH x AS
(
  SELECT 0 n FROM dual
  union all
  SELECT 1 FROM dual
  union all
  SELECT 2 FROM dual
  union all
  SELECT 3 FROM dual
  union all
  SELECT 4 FROM dual
  union all
  SELECT 5 FROM dual
  union all
  SELECT 6 FROM dual
  union all
  SELECT 7 FROM dual
  union all
  SELECT 8 FROM dual
  union all
  SELECT 9 FROM dual
), t1 AS
(
  SELECT ones.n + 10 * tens.n + 100 * hundreds.n + 1000 * thousands.n + 10000 * tenthousands.n + 100000 * hundredthousands.n as id
  FROM x ones,     x tens,      x hundreds,       x thousands,       x tenthousands,       x hundredthousands
), t2 AS
(
    SELECT  id,
            mod(id, 100) groupby
    FROM t1
)
SELECT  cast(id as int) id,
        cast(groupby as int) groupby,
        cast(mod(orderby, 9173) as int) fkb,
        cast(mod(id, 911) as int) search
FROM t2;

CREATE  TABLE B(
  id int not null primary key, 
  groupby int null, 
  search int null
) AS
WITH x AS 
(
  SELECT 0 n FROM dual
  union all 
  SELECT 1 FROM dual
  union all 
  SELECT 2 FROM dual
  union all 
  SELECT 3 FROM dual
  union all 
  SELECT 4 FROM dual
  union all 
  SELECT 5 FROM dual
  union all 
  SELECT 6 FROM dual
  union all 
  SELECT 7 FROM dual
  union all 
  SELECT 8 FROM dual
  union all 
  SELECT 9 FROM dual  
), t1 AS
(
  SELECT ones.n + 10 * tens.n + 100 * hundreds.n + 1000 * thousands.n as id  
  FROM x ones,     x tens,      x hundreds,       x thousands       
)
SELECT  cast(id as int) id,
        cast(mod(id + floor(100000 / (id+1)) , 100) as int) groupby,
        cast(mod(id, 901) as int) search,
        rpad(concat('Value ', id), 1000, '*') as padding
FROM t1;

我想在H2中尽快处理以下条件聚合查询,但是,不添加任何其他索引。

SELECT  B.groupby,
       count(CASE WHEN A.search = 1 THEN 1 END) as search1,
       count(CASE WHEN A.search = 900 THEN 1 END) as search2
FROM B
LEFT JOIN A ON A.fkb = B.id
WHERE B.search < 10
GROUP BY B.groupby

是否可以重写查询最多运行2分钟的查询?我尝试了许多不同的重写,但是,每次重写都持续运行了几分钟。我将Java虚拟机内存设置为4GB(-Xmx4G)。

如果我在MySQL中尝试相同的测试,并且查询在不到10秒的时间内得到处理。

1 个答案:

答案 0 :(得分:1)

您的初始化脚本存在语法错误,我通过以下方式对其进行了修改:

CREATE  TABLE A(
  id int not null primary key, 
  groupby int null, 
  fkb int null, 
  search int null,
  padding varchar(1000) null
)  AS
SELECT  cast(x as int) id,
        cast(mod(x, 100) as int) groupby,
        cast(mod(mod(x, 100), 9173) as int) fkb,
        cast(mod(x, 911) as int) search,
        rpad(concat('Value ', x), 1000, '*') as padding
FROM SYSTEM_RANGE(0, 999999);

CREATE  TABLE B(
  id int not null primary key, 
  groupby int null, 
  search int null
) AS
SELECT  cast(x as int) id,
        cast(mod(x + floor(100000 / (x+1)), 100) as int) groupby,
        cast(mod(x, 901) as int) search
FROM SYSTEM_RANGE(0, 9999);

为简单起见,我还使用了特定于H2的SYSTEM_RANGE()

与查询一起使用的EXPLAIN命令显示以下执行计划

SELECT
    "B"."GROUPBY",
    COUNT(CASE WHEN ("A"."SEARCH" = 1) THEN 1 END) AS "SEARCH1",
    COUNT(CASE WHEN ("A"."SEARCH" = 900) THEN 1 END) AS "SEARCH2"
FROM "PUBLIC"."B"
    /* PUBLIC.B.tableScan */
    /* WHERE B.SEARCH < 10
    */
LEFT OUTER JOIN "PUBLIC"."A"
    /* PUBLIC.A.tableScan */
    ON "A"."FKB" = "B"."ID"
WHERE "B"."SEARCH" < 10
GROUP BY "B"."GROUPBY"

这是预期的,因为您没有任何索引。不幸的是,没有它们,您将无法显着提高性能。

我认为您需要一个约束条件。

ALTER TABLE A ADD CONSTRAINT A_FKB_FK FOREIGN KEY(FKB) REFERENCES B(ID);

有了这样的约束执行计划会更好:

SELECT
    "B"."GROUPBY",
    COUNT(CASE WHEN ("A"."SEARCH" = 1) THEN 1 END) AS "SEARCH1",
    COUNT(CASE WHEN ("A"."SEARCH" = 900) THEN 1 END) AS "SEARCH2"
FROM "PUBLIC"."B"
    /* PUBLIC.B.tableScan */
    /* WHERE B.SEARCH < 10
    */
LEFT OUTER JOIN "PUBLIC"."A"
    /* PUBLIC.A_FKB_FK_INDEX_4: FKB = B.ID */
    ON "A"."FKB" = "B"."ID"
WHERE "B"."SEARCH" < 10
GROUP BY "B"."GROUPBY"

由于约束,您的查询在我的旧PC上大约需要11秒。

您也可以在带有H2的查询中使用COUNT(*) FILTER (WHERE A.search = 1),但是这种查询与MySQL不兼容,MySQL不支持标准的SQL:2003 FILTER子句,并且FILTER子句并没有真正改善性能的查询,它只能提供更好的可读性。