我正在对一个过夜的Vertica集群运行查询。我想知道我可以做些什么来提高性能。我的查询包含一个小表(21k行),其中ID列表连接到匹配字段(pat_id)上的大表(> 140亿行)。我想返回大表中所有在我的小参考表中都有ID的行。大表存储为外部镶木地板文件。任何建议都会非常感激。
解释声明如下:
explain select ims.* from juv_arthritis_pts juv join dwdev1_data.IMS_claims ims on juv.pat_id = ims.pat_id
Access Path:
+-JOIN HASH [Cost: 2K, Rows: 21K (NO STATISTICS)] (PATH ID: 1) Outer (RESEGMENT)(LOCAL ROUND ROBIN)
| Join Cond: (juv.pat_id = ims.pat_id)
| Execute on: v_dwp1_node0001, v_dwp1_node0002, v_dwp1_node0003, v_dwp1_node0005
| +-- Outer -> LOAD EXTERNAL TABLE [Cost: 0, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Table: IMS_claims
| | copy from '/mapr/mapr.XXX.local/Environments/svc.dwdev1/data/ims_claims.final/*/*' PARQUET
| | Execute on: Query Initiator
| +-- Inner -> STORAGE ACCESS for juv [Cost: 33, Rows: 21K] (PATH ID: 3)
| | Projection: X.juv_arthritis_pts_b0
| | Materialize: X.pat_id
| | Execute on: v_dwp1_node0001, v_dwp1_node0002, v_dwp1_node0003
| +---> STORAGE ACCESS for juv (REPLACEMENT FOR DOWN NODE) [Cost: 49, Rows: 21K]
| | Projection: X.juv_arthritis_pts_b1
| | Materialize: juv.pat_id
| | Execute on: v_dwp1_node0001, v_dwp1_node0005