Question

我正在使用2个表中的完全外部联接执行选择查询，这些表位于2个不同的数据库中。我使用的是Postgresql 9.6。

即使我们设置了以下参数，查询也不会带有并行性：

work_mem=256MB,
max_worker_process=40,
force_parallel_mode=on,
max_parallel_workers_per_gather=4,
parallel_tuple_cost=0.1,
parallel_setup_cost=1000,
min_parallel_relation_size=8MB

这是查询：

SELECT mea.ocs_cdr_type,
       mea.ocs_time_stamp,
       mea.sum_ocs_call_cost,
       ctr.ctr_name
FROM mea_req_54 mea
   FULL OUTER JOIN country ctr ON mea.ocs_imei  = ctr.ctr_name;

这是mea_req_54：

的定义

                      Table "public.mea_req_54"
           Column           |            Type             | Modifiers
----------------------------+-----------------------------+-----------
 mer_id                     | numeric(19,0)               | not null
 mer_from_dttm              | timestamp without time zone | not null
 mer_to_dttm                | timestamp without time zone | not null
 fng_id                     | numeric(19,0)               |
 ocs_imsi_number_norm       | character varying(255)      |
 ocs_account_number         | character varying(255)      |
 ocs_charging_id            | character varying(255)      |
 ocs_cdr_type               | character varying(255)      |
 ocs_bit_description        | character varying(255)      |
 ocs_time_stamp_raw         | timestamp without time zone |
 ocs_time_stamp             | timestamp without time zone |
 ocs_duration               | numeric(10,0)               |
 ocs_duration_str           | character varying(255)      |
 ocs_upload_volume          | numeric(19,0)               |
 ocs_download_volume        | numeric(19,0)               |
 sum_ocs_total_volume       | numeric(19,0)               |
 sum_ocs_call_cost          | numeric(19,0)               |
 ocs_plmn_identifier        | character varying(255)      |
 ocs_imei                   | character varying(255)      |
 ocs_user_loc_info          | character varying(255)      |
 ocs_bp_id                  | character varying(255)      |
 ocs_ref_spec_from_contract | character varying(255)      |
 ocs_subapp_in_contract_acc | character varying(255)      |
 ocs_baseline_date_bill     | timestamp without time zone |
 ocs_target_date_bill       | timestamp without time zone |
 ocs_date_of_origin_bill    | timestamp without time zone |
 ctr_id                     | numeric(10,0)               |
 ctr_iso_cd                 | character varying(255)      |
 ctr_name                   | character varying(255)      |
 dblink_run                 | numeric(10,0)               |
Indexes:
    "mea_req_54_pk" UNIQUE, btree (mer_id)

这是country：

的定义

                  Table "public.country"
       Column        |          Type          | Modifiers
---------------------+------------------------+-----------
 ctr_id              | numeric(10,0)          | not null
 ctr_iso_cd          | character varying(255) | not null
 ctr_name            | character varying(255) | not null
 system_generated_fl | character(1)           |
 ctr_delete_fl       | character(1)           | not null
 ctr_dial_code       | character varying(255) | not null
 ctr_version_id      | numeric(10,0)          | not null
 ptn_id              | numeric(10,0)          | not null
Indexes:
    "country_ak" UNIQUE, btree (ctr_name)
    "country_pk" UNIQUE, btree (ctr_id)
    "country_ss1" UNIQUE, btree (ctr_iso_cd)

这是执行计划：

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=482.50..14564568.50 rows=300000000 width=29) (actual time=8.810..305863.949 rows=300015000 loops=1)
   Hash Cond: ((mea.ocs_imei)::text = (ctr.ctr_name)::text)
   ->  Seq Scan on mea_req_54 mea  (cost=0.00..10439086.00 rows=300000000 width=19) (actual time=0.005..131927.791 rows=300000000 loops=1)
   ->  Hash  (cost=295.00..295.00 rows=15000 width=13) (actual time=8.784..8.784 rows=15000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 791kB
         ->  Seq Scan on country ctr  (cost=0.00..295.00 rows=15000 width=13) (actual time=0.008..4.138 rows=15000 loops=1)
 Planning time: 0.085 ms
 Execution time: 355065.791 ms
(8 rows)

Answer 1

文档没有提及，但在backend/optimizer/path/joinpath.c，函数hash_inner_and_outer中，我发现了以下启发性评论：

/*
 * If the joinrel is parallel-safe, we may be able to consider a
 * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
 * because the outer path will be partial, and therefore we won't be
 * able to properly guarantee uniqueness.  Similarly, we can't handle
 * JOIN_FULL and JOIN_RIGHT, because they can produce false null
 * extended rows.  Also, the resulting path must not be parameterized.
 */

这是有道理的 - 扫描部分mea_req_54的并行工作人员无法知道country中是否有与任何行匹配的行在mea_req_54。

现在嵌套的循环连接不能用于完整的外连接，所以剩下的就是并行合并连接。

我不能说合并连接是否是此处的选项，但您可以尝试在mea_req_54(ocs_imei)上创建索引，看看是否有助于优化器选择并行计划。

否则，你可能运气不好。

Answer 2

将parallel_tuple_cost和parallel_setup_cost参数从默认值减去后，查询以并行方式运行。

但我想知道这些参数究竟是什么？

Postgresql - 不使用并行性

2 个答案: