始终在位图索引扫描后跟位图堆扫描以进行JSON字段查询

时间:2018-07-18 19:13:42

标签: json postgresql jsonb postgresql-9.5

我有以下索引:

CREATE INDEX index_c_profiles_on_city_state_name_domain ON 
c_profiles ((data->>'state'), (data->>'city'), name, domain);

我正在使用以下查询:

SELECT mm.name, mm.domain, mm.data ->> 'city' as city, mm.data ->> 
'state' as state 
FROM c_profiles as mm
WHERE ((mm.data ->> 'state') = 'AZ')

但是当我使用EXPLAIN ANALYZE进行测试时,它总是在进行位图索引扫描(良好且快速),然后进行非常慢的位图堆扫描(通常比单独的索引扫描慢100倍)。

我还尝试仅索引WHERE条件,结果是相同的,它在使用索引后仍在执行非常慢的位图堆扫描。

为什么Postgres会这样做?我该如何进行索引扫描以使此查询快速进行?

这是示例EXPLAIN ANALYZE结果:

[
  {
    "Execution Time": 53.655,
    "Planning Time": 0.081,
    "Plan": {
      "Exact Heap Blocks": 1338,
      "Node Type": "Bitmap Heap Scan",
      "Actual Total Time": 53.031,
      "Shared Hit Blocks": 727,
      "Schema": "public",
      "Plans": [
        {
          "Node Type": "Bitmap Index Scan",
          "Actual Total Time": 0.455,
          "Shared Hit Blocks": 2,
          "Shared Read Blocks": 13,
          "Temp Written Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Hit Blocks": 0,
          "Plan Width": 0,
          "Actual Loops": 1,
          "Actual Startup Time": 0.455,
          "Temp Read Blocks": 0,
          "Local Read Blocks": 0,
          "Index Name": "index_mattermark_profiles_on_city_state_name_domain",
          "Startup Cost": 0,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 0,
          "Local Written Blocks": 0,
          "Plan Rows": 788,
          "Index Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
          "Actual Rows": 1417,
          "Parent Relationship": "Outer",
          "Total Cost": 34.33
        }
      ],
      "Shared Read Blocks": 650,
      "Relation Name": "mattermark_profiles",
      "Local Hit Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Temp Written Blocks": 0,
      "Plan Width": 1010,
      "Actual Loops": 1,
      "Rows Removed by Index Recheck": 0,
      "Lossy Heap Blocks": 0,
      "Alias": "mm",
      "Recheck Cond": "((mm.data ->> 'state'::text) = 'AZ'::text)",
      "Temp Read Blocks": 0,
      "Output": [
        "name",
        "domain",
        "(data ->> 'city'::text)",
        "(data ->> 'state'::text)"
      ],
      "Actual Startup Time": 0.703,
      "Local Read Blocks": 0,
      "Startup Cost": 34.53,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Written Blocks": 0,
      "Plan Rows": 788,
      "Actual Rows": 1417,
      "Total Cost": 2894.17
    },
    "Triggers": []
  }
]

1 个答案:

答案 0 :(得分:2)

PostgreSQL认为速度会更快时,选择了位图索引扫描而不是常规索引扫描。

通常在估算结果行计数很高的情况下。

正常的索引扫描将不得不为找到的每个索引条目访问表,这会导致表上出现很多随机I / O,并且可能需要对同一块进行多次处理。

位图索引扫描的工作方式是首先找到所有索引条目,然后按表中物理位置的顺序对它们进行排序,然后从表中扫描所需的块。这样会更有效率,因为它将顺序扫描表块。

第二步是位图堆扫描,它在EXPLAIN输出中显示为自己的节点,通常是比较昂贵的一步。

一切似乎井井有条。

您可以尝试将enable_bitmapscan设置为off,以查看PostgreSQL是否正确,并且最终的计划会更昂贵。