我正在使用两个PostgreSQL 9.6数据库,并尝试使用postgres_fdw从另一个DB中查询其中一个数据库(一个是具有数据的生产备份数据库,另一个是用于进行各种分析的数据库)。
我遇到了一些奇怪的行为,但查询中某些类型的WHERE子句没有传递给远程数据库,而是保留在本地数据库中,用于过滤从远程数据库收到的结果。这导致远程数据库尝试发送比本地数据库在网络上需要的信息更多的信息,并且受影响的查询显着更慢(15秒对15分钟)。
我大多数时候都看到了这个与时间戳相关的条款,下面的例子是我第一次遇到这个问题,但我在其他几个版本中看到过它,例如用TIMESTAMP文字(慢速)或TIMESTAMP替换CURRENT_TIMESTAMP TIME ZONE字面(快速)。
在某个地方我有什么设置可以帮到这个吗?我正在为一个团队设置一个混合级别的SQL背景,大多数人都没有体验过EXPLAIN计划和诸如此类的东西。我提出了一些解决方法(例如将相对时间子句放在子SELECT中),但我不断遇到问题的新实例。
一个例子:
SELECT var_1
,var_2
FROM schema_A.table_A
WHERE execution_ts <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND execution_ts >= CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour'
ORDER BY var_1
解释计划
Sort (cost=147.64..147.64 rows=1 width=1048)
Output: table_A.var_1, table_A.var_2
Sort Key: (table_A.var_1)::text
-> Foreign Scan on schema_A.table_A (cost=100.00..147.63 rows=1 width=1048)
Output: table_A.var_1, table_A.var_2
Filter: ((table_A.execution_ts <= (now() - '01:00:00'::interval))
AND (table_A.execution_ts >= ((now() - '7 days'::interval) - '01:00:00'::interval)))
Remote SQL: SELECT var_1, execution_ts FROM model.table_A
WHERE ((model_id::text = 'ABCD'::text))
AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))
以上操作大约需要15-20分钟才能运行,而以下操作则需要几秒钟。
SELECT var_1
,var_2
FROM schema_A.table_A
WHERE execution_ts <= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 hour')
AND execution_ts >= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour')
ORDER BY var_1
解释计划
Sort (cost=158.70..158.71 rows=1 width=16)
Output: table_A.var_1, table_A.var_2
Sort Key: table_A.var_1
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=8)
Output: (now() - '01:00:00'::interval)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.02 rows=1 width=8)
Output: ((now() - '7 days'::interval) - '01:00:00'::interval)
-> Foreign Scan on schema_A.table_A (cost=100.00..158.66 rows=1 width=16)
Output: table_A.var_1, table_A.var_2
Remote SQL: SELECT var_1, var_2 FROM model.table_A
WHERE ((execution_ts <= $1::timestamp with time zone))
AND ((execution_ts >= $2::timestamp with time zone))
AND ((model_id::text = 'ABCD'::text))
AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))
答案 0 :(得分:4)
任何非2018-05-04 13:46:36 WARN TaskSetManager:66 - Lost task 1.0 in stage 1.0 (TID 21, 192.168.10.107, executor 0): java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.<init>(String.java:325)
at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:598)
at com.esotericsoftware.kryo.io.Input.readString(Input.java:472)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:195)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:184)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:278)
at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:156)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:153)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:90)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
的功能都不会被推下。
请参阅IMMUTABLE
中的is_foreign_expr
函数:
contrib/postgres_fdw/deparse.c
答案 1 :(得分:1)
我认为问题可能是original 0 : abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl
expected Head : abc_Head_def_Head
found Head : abc_Head_def_Head
expected Tail : Tail_ghi_Tail_jkl
found Tail : Tail_ghi_Tail_jkl
original 1 : abc_Head_first_Tail_ghi_Head_second_Tail_opq
expected Head : abc_Head
found Head : abc_Head_first_Tail_ghi_Head
expected Tail : Tail_ghi_Head_second_Tail_opq
found Tail : Tail_opq
的执行内容(now()
已解决)。
CURRENT_TIMESTAMP
返回的值对于当前事务是固定的 - 这意味着它必须在本地执行。
将它包裹起来,subselect将其强制转换为常量now()
值
允许评估远程执行。
使用时间戳常量必须在timestamp和timestamptz之间进行转换,这是使用当前时区规则(根据timestamptz
)执行的,并且数据库选择将远程SET TIME ZONE TO ....
转换为本地时间以进行比较再次timestamptz
字面值必须在本地完成。
一般timestamp
(使用timestamp
代替),除非您