Postgres_FDW没有按下WHERE标准

时间:2018-05-03 22:44:35

标签: postgresql postgres-fdw postgres-9.6

我正在使用两个PostgreSQL 9.6数据库,并尝试使用postgres_fdw从另一个DB中查询其中一个数据库(一个是具有数据的生产备份数据库,另一个是用于进行各种分析的数据库)。

我遇到了一些奇怪的行为,但查询中某些类型的WHERE子句没有传递给远程数据库,而是保留在本地数据库中,用于过滤从远程数据库收到的结果。这导致远程数据库尝试发送比本地数据库在网络上需要的信息更多的信息,并且受影响的查询显着更慢(15秒对15分钟)。

我大多数时候都看到了这个与时间戳相关的条款,下面的例子是我第一次遇到这个问题,但我在其他几个版本中看到过它,例如用TIMESTAMP文字(慢速)或TIMESTAMP替换CURRENT_TIMESTAMP TIME ZONE字面(快速)。

在某个地方我有什么设置可以帮到这个吗?我正在为一个团队设置一个混合级别的SQL背景,大多数人都没有体验过EXPLAIN计划和诸如此类的东西。我提出了一些解决方法(例如将相对时间子句放在子SELECT中),但我不断遇到问题的新实例。

一个例子:

SELECT      var_1
           ,var_2
FROM        schema_A.table_A
WHERE       execution_ts <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
        AND execution_ts >= CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour'
ORDER BY    var_1

解释计划

Sort  (cost=147.64..147.64 rows=1 width=1048)
  Output: table_A.var_1, table_A.var_2
  Sort Key: (table_A.var_1)::text
  ->  Foreign Scan on schema_A.table_A  (cost=100.00..147.63 rows=1 width=1048)
        Output: table_A.var_1, table_A.var_2
        Filter: ((table_A.execution_ts <= (now() - '01:00:00'::interval)) 
             AND (table_A.execution_ts >= ((now() - '7 days'::interval) - '01:00:00'::interval)))
        Remote SQL: SELECT var_1, execution_ts FROM model.table_A
                    WHERE ((model_id::text = 'ABCD'::text))
                      AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))

以上操作大约需要15-20分钟才能运行,而以下操作则需要几秒钟。

SELECT      var_1
           ,var_2
FROM        schema_A.table_A
WHERE       execution_ts <= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 hour')
        AND execution_ts >= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour')
ORDER BY    var_1

解释计划

Sort  (cost=158.70..158.71 rows=1 width=16)
  Output: table_A.var_1, table_A.var_2
  Sort Key: table_A.var_1
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=8)
          Output: (now() - '01:00:00'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8)
          Output: ((now() - '7 days'::interval) - '01:00:00'::interval)
  ->  Foreign Scan on schema_A.table_A  (cost=100.00..158.66 rows=1 width=16)
        Output: table_A.var_1, table_A.var_2
        Remote SQL: SELECT var_1, var_2 FROM model.table_A
                    WHERE ((execution_ts <= $1::timestamp with time zone))
                      AND ((execution_ts >= $2::timestamp with time zone))
                      AND ((model_id::text = 'ABCD'::text))
                      AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))

2 个答案:

答案 0 :(得分:4)

任何非2018-05-04 13:46:36 WARN TaskSetManager:66 - Lost task 1.0 in stage 1.0 (TID 21, 192.168.10.107, executor 0): java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.<init>(String.java:325) at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:598) at com.esotericsoftware.kryo.io.Input.readString(Input.java:472) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:195) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:184) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:278) at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156) at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188) at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:153) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41) at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:90) at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 的功能都不会被推下。

请参阅IMMUTABLE中的is_foreign_expr函数:

contrib/postgres_fdw/deparse.c

答案 1 :(得分:1)

我认为问题可能是original 0 : abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl expected Head : abc_Head_def_Head found Head : abc_Head_def_Head expected Tail : Tail_ghi_Tail_jkl found Tail : Tail_ghi_Tail_jkl original 1 : abc_Head_first_Tail_ghi_Head_second_Tail_opq expected Head : abc_Head found Head : abc_Head_first_Tail_ghi_Head expected Tail : Tail_ghi_Head_second_Tail_opq found Tail : Tail_opq 的执行内容(now()已解决)。
CURRENT_TIMESTAMP返回的值对于当前事务是固定的 - 这意味着它必须在本地执行。

将它包裹起来,subselect将其强制转换为常量now()值 允许评估远程执行。

使用时间戳常量必须在timestamp和timestamptz之间进行转换,这是使用当前时区规则(根据timestamptz)执行的,并且数据库选择将远程SET TIME ZONE TO ....转换为本地时间以进行比较再次timestamptz字面值必须在本地完成。

应该避免

一般timestamp(使用timestamp代替),除非您

  1. 希望值遵循对夏令时所做的任何更改 规则和
  2. 确定你永远不想在秋天的加入时间内代表时间戳