Question

Postgres-XL 9.5r1.6由gtm，协调器和两个数据节点组成。

三个表a，b和c实现了多对多关系：

create table a(id int, name text, uid int) distribute by hash(uid);
create table b(id int, name text, uid int) distribute by hash(uid);
create table c(id int, aname text, bname text, uid int) distribute by hash(uid);

在 coordinator 上执行以下查询时，它花费了无法解释的时间 20000毫秒！但是在两个 datanodes 上，执行时间几乎都不会超过 20毫秒。

select a.name, b.name

from 
       a left join c
       on a.name=c.aname

          left join b
          on c.bname=b.name
where
       a.name='cf82c96b77b8aa5277da6d55c4e4e66e';

协调员解释计划：

Remote Subquery Scan on all (dn_1,dn_2)  (cost=8.33..17.78 rows=1 width=66)


 ->  Nested Loop Left Join  (cost=8.33..17.78 rows=1 width=66)
         Join Filter: ((a.name)::text = (c.aname)::text)
         ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=100.15..108.21 rows=1 width=33)
               Distribute results by H: name
               ->  Index Only Scan using code_idx on a  (cost=0.15..8.17 rows=1 width=33)
                     Index Cond: (name = 'cf82c96b77b8aa5277da6d55c4e4e66e'::text)
         ->  Materialize  (cost=108.18..109.72 rows=1 width=115)
               ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=108.18..109.72 rows=1 width=115)
                     Distribute results by H: aname
                     ->  Hash Right Join  (cost=8.18..9.60 rows=1 width=115)
                           Hash Cond: ((b.name)::text = (c.bname)::text)
                           ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=100.00..102.44 rows=30 width=33)
                                 Distribute results by H: name
                                 ->  Seq Scan on b  (cost=0.00..1.30 rows=30 width=33)
                           ->  Hash  (cost=108.41..108.41 rows=1 width=244)
                                 ->  Remote Subquery Scan on all (dn_1,dn_2)  (cost=100.15..108.41 rows=1 width=244)
                                       Distribute results by H: bname
                                       ->  Index Only Scan using code_idxcfc on c  (cost=0.15..8.17 rows=1 width=244)
                                             Index Cond: (aname = 'cf82c96b77b8aa5277da6d55c4e4e66e'::text)

其他人已经遇到了这个问题，并问了here，但没有任何答案或提示。我只是希望这次问题能有所启发。

ps：我试图以这样的方式填充这三个表，即来自表a的{{1}}和b的相关行仅来自同一datanode。但是执行时间没有改善。值得注意的另一点是，当c子句（where）中的条件始终为false时，执行时间将降低到不到几毫秒的时间。

Answer 1

对于此查询：

select a.name, b.name
from a left join
     c
     on a.name = c.aname left join
     b
     on c.bname = b.name
where a.name = 'cf82c96b77b8aa5277da6d55c4e4e66e';

您要在a(name)，b(name)和c(name)上建立索引。您的分区将无助于此查询，并且仅在表很大时才应保留它们。

Answer 2

这是由于嵌套循环所致，请将其设置为false。 XL将使用哈希联接，然后将快速返回结果

postgres-xl左连接执行时间太长

2 个答案: