我正在使用2个表中的完全外部联接执行选择查询,这些表位于2个不同的数据库中。我使用的是Postgresql 9.6。
即使我们设置了以下参数,查询也不会带有并行性:
work_mem=256MB,
max_worker_process=40,
force_parallel_mode=on,
max_parallel_workers_per_gather=4,
parallel_tuple_cost=0.1,
parallel_setup_cost=1000,
min_parallel_relation_size=8MB
这是查询:
SELECT mea.ocs_cdr_type,
mea.ocs_time_stamp,
mea.sum_ocs_call_cost,
ctr.ctr_name
FROM mea_req_54 mea
FULL OUTER JOIN country ctr ON mea.ocs_imei = ctr.ctr_name;
这是mea_req_54
:
Table "public.mea_req_54"
Column | Type | Modifiers
----------------------------+-----------------------------+-----------
mer_id | numeric(19,0) | not null
mer_from_dttm | timestamp without time zone | not null
mer_to_dttm | timestamp without time zone | not null
fng_id | numeric(19,0) |
ocs_imsi_number_norm | character varying(255) |
ocs_account_number | character varying(255) |
ocs_charging_id | character varying(255) |
ocs_cdr_type | character varying(255) |
ocs_bit_description | character varying(255) |
ocs_time_stamp_raw | timestamp without time zone |
ocs_time_stamp | timestamp without time zone |
ocs_duration | numeric(10,0) |
ocs_duration_str | character varying(255) |
ocs_upload_volume | numeric(19,0) |
ocs_download_volume | numeric(19,0) |
sum_ocs_total_volume | numeric(19,0) |
sum_ocs_call_cost | numeric(19,0) |
ocs_plmn_identifier | character varying(255) |
ocs_imei | character varying(255) |
ocs_user_loc_info | character varying(255) |
ocs_bp_id | character varying(255) |
ocs_ref_spec_from_contract | character varying(255) |
ocs_subapp_in_contract_acc | character varying(255) |
ocs_baseline_date_bill | timestamp without time zone |
ocs_target_date_bill | timestamp without time zone |
ocs_date_of_origin_bill | timestamp without time zone |
ctr_id | numeric(10,0) |
ctr_iso_cd | character varying(255) |
ctr_name | character varying(255) |
dblink_run | numeric(10,0) |
Indexes:
"mea_req_54_pk" UNIQUE, btree (mer_id)
这是country
:
Table "public.country"
Column | Type | Modifiers
---------------------+------------------------+-----------
ctr_id | numeric(10,0) | not null
ctr_iso_cd | character varying(255) | not null
ctr_name | character varying(255) | not null
system_generated_fl | character(1) |
ctr_delete_fl | character(1) | not null
ctr_dial_code | character varying(255) | not null
ctr_version_id | numeric(10,0) | not null
ptn_id | numeric(10,0) | not null
Indexes:
"country_ak" UNIQUE, btree (ctr_name)
"country_pk" UNIQUE, btree (ctr_id)
"country_ss1" UNIQUE, btree (ctr_iso_cd)
这是执行计划:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Hash Full Join (cost=482.50..14564568.50 rows=300000000 width=29) (actual time=8.810..305863.949 rows=300015000 loops=1)
Hash Cond: ((mea.ocs_imei)::text = (ctr.ctr_name)::text)
-> Seq Scan on mea_req_54 mea (cost=0.00..10439086.00 rows=300000000 width=19) (actual time=0.005..131927.791 rows=300000000 loops=1)
-> Hash (cost=295.00..295.00 rows=15000 width=13) (actual time=8.784..8.784 rows=15000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 791kB
-> Seq Scan on country ctr (cost=0.00..295.00 rows=15000 width=13) (actual time=0.008..4.138 rows=15000 loops=1)
Planning time: 0.085 ms
Execution time: 355065.791 ms
(8 rows)
答案 0 :(得分:1)
文档没有提及,但在backend/optimizer/path/joinpath.c
,函数hash_inner_and_outer
中,我发现了以下启发性评论:
/*
* If the joinrel is parallel-safe, we may be able to consider a
* partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
* because the outer path will be partial, and therefore we won't be
* able to properly guarantee uniqueness. Similarly, we can't handle
* JOIN_FULL and JOIN_RIGHT, because they can produce false null
* extended rows. Also, the resulting path must not be parameterized.
*/
这是有道理的 - 扫描部分mea_req_54
的并行工作人员无法知道country
中是否有与任何行匹配的行在mea_req_54
。
现在嵌套的循环连接不能用于完整的外连接,所以剩下的就是并行合并连接。
我不能说合并连接是否是此处的选项,但您可以尝试在mea_req_54(ocs_imei)
上创建索引,看看是否有助于优化器选择并行计划。
否则,你可能运气不好。
答案 1 :(得分:0)
将parallel_tuple_cost和parallel_setup_cost参数从默认值减去后,查询以并行方式运行。
但我想知道这些参数究竟是什么?