Question

我有一个非常简单的查询在Informix 11中运行缓慢，即使存在适当的索引并且正在使用它：

select COUNTRY, COUNT(*) from EVENTS group by COUNTRY

有什么理由说它应该运行缓慢吗？我有与SQL Server类似查询的经验，如果存在适当的索引，它们会立即执行。

更多信息：

对于EVENTS表中的500.000条记录，查询大约需要15秒（这让我很担心，因为这个表有数百万条记录，而且我已经看到执行时间正在迅速增加）。
EVENTS表有COUNTRY的索引。通过使用EXPLAIN指令，我检查了正在使用此索引。
EVENTS表有很多列（大约70个）。
“country”列是varchar（32）。
“country”有25个不同的值。
表格扫描由Informix完成：

1) informix.EVENTS: INDEX PATH

(1) Index Name: informix.country_ix Index Keys: COUNTRY (Serial, fragments: ALL) Query statistics: ----------------- Table map : ---------------------------- Internal name Table name ---------------------------- t1 EVENTS type table rows_prod est_rows rows_scan time est_cost ------------------------------------------------------------------- scan t1 501906 39285 501906 00:14.88 29390 type rows_prod est_rows rows_cons time est_cost ------------------------------------------------------------ group 25 4 501906 00:15.58 79761

Answer 1

所以，我做了一些测试。

TL; DR

将国家/地区列类型更改为CHAR(32)，重建索引，您应该有更好的表现。

长版：

在linux centos 7上使用informix 12.10FC6DE（在virtualbox中创建的VM）。用于dbspace的页面大小为2048字节，缓冲池为50000页。

创建一个表（tst），其行大小约为425字节（平均每页4行），有几列。在这些列中，一个是country VARCHAR(32)，另一个是static_country CHAR(32)。使用499999行填充表格，country和static_country列均匀分布为25个国家/地区名称。

创建了2个索引，一个在列country（idx1_tst）上，另一个在列static_country上（idx2_tst）。

表分区使用了125000个数据页（使用oncheck -pT）。索引使用了大约1500页（使用oncheck -pT）。

A。多次运行查询，强制进行SEQUENCIAL SCAN（运行时间在10到15秒之间）：

SELECT --+ FULL (tst)
    country, COUNT(*)
FROM
    tst
GROUP BY
    country

DIRECTIVES FOLLOWED:
FULL ( tst )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 1415645
Estimated # of Rows Returned: 25
Temporary Files Required For: Group By

  1) mydb.tst: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                tst

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     499999     499999    499999     00:12.17   140001

  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    25         25        499999     00:13.01   1275644

B。多次运行查询，强制country列索引上的INDEX SCAN，类型为VARCHAR（32）（运行时间介于4分30秒到5分钟之间）：< / p>

SELECT --+ INDEX (tst idx1_tst)
    country, COUNT(*)
FROM
    tst
GROUP BY
    country

DIRECTIVES FOLLOWED:
INDEX ( tst idx1_tst )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 3462411
Estimated # of Rows Returned: 25

  1) mydb.tst: INDEX PATH

    (1) Index Name: mydb.idx1_tst
        Index Keys: country   (Serial, fragments: ALL)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                tst

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     499999     499999    499999     04:49.71   3462411

  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    25         25        499999     04:50.51   1275644

C。多次运行查询，强制static_country列索引上的INDEX SCAN，类型为CHAR（32）（运行时间在2到3秒之间）：

SELECT --+ INDEX (tst idx2_tst)
    static_country, COUNT(*)
FROM
    tst
GROUP BY
    static_country

DIRECTIVES FOLLOWED:
INDEX ( tst idx2_tst )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 16428
Estimated # of Rows Returned: 25

  1) mydb.tst: INDEX PATH

    (1) Index Name: mydb.idx2_tst
        Index Keys: static_country   (Key-Only)  (Serial, fragments: ALL)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                tst

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     499999     499999    499999     00:02.02   16429

  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    25         25        499999     00:02.72   1277132

使用sysmaster数据库上的SMI表sysptprof，我可以看到以下计数器（在运行之间使用onstat -z重置计数器）：

如果是A（SEQUENCIAL SCAN）：
- table tst partition：
  - lockreqs 499999
  - isreads 125001
  - bufreads 500060
  - pagreads 117532
如果是B（VARCHAR类型列上的INDEX SCAN）：
- table tst partition：
  - lockreqs 499999
  - isreads 499990
  - bufreads 999997
  - pagreads 348585
- index idx1_tst分区：
  - lockreqs 499999
  - isreads 500009
  - bufreads 506961
  - pagreads 2545
如果是C（CHAR类型列上的INDEX SCAN）：
- index idx2_tst分区：
  - lockreqs 499999
  - isreads 500000
  - bufreads 502879
  - pagreads 1440

因此，对于SEQUENCIAL SCAN，表格分区上只有活动，正如我预期的那样。

对于CHAR列上的INDEX SCAN，索引分区上只有活动，正如我预期的那样（解释包含Key-Only指示）。

对于VARCHAR列上的INDEX SCAN，表和索引分区都有活动，而不是我的预期（但正如Fernando指出的那样，解释不包含Key-Only指示）。

我无法从informix解释这种行为。但是一位同事向我指出了informix性能手册（版本12.10FC6，第10章，查询计划，访问计划）中的这一条目：

重要说明：优化程序不会为VARCHAR选择仅密钥扫描柱。如果要利用仅键扫描，请使用ALTER 带有MODIFY子句的TABLE，用于将列更改为CHAR数据类型。

Answer 2

我会尝试的事情：

COUNT(1)而不是COUNT(*)
测试查询并检查没有索引的执行计划，因为它可能是一个混乱的来源
测试索引加速的查询和尝试不同的索引类型

慢速Informix COUNT / GROUP BY查询，即使有适当的索引

2 个答案: