Postgresql foreign tables exists clause

时间:2015-07-28 17:05:25

标签: postgresql postgresql-9.3 foreign-data-wrapper

I'm running two different postgres 9.3 instances (one for production, one for development/testing). I want to copy a subset of a table in production into development.

Let's say then that the table I want to copy is defined as

CREATE TABLE users (user_id varchar PRIMARY KEY, other_stuff varchar);

and the subset I want to copy is all the users who have user_id in a cached table (on production), which is a much smaller table than the users table

CREATE TABLE user_id_subset (user_id varchar PRIMARY KEY);

I've set up some foreign tables on my development db to access these two tables (named foreign_users, and foreign_user_id_subset respectively), and I want to do a query like:

INSERT INTO development_users (user_id, other_stuff) 
    SELECT user_id, other_stuff FROM foreign_users f
    WHERE EXISTS (
        SELECT 1 FROM foreign_user_id_subset ss 
        WHERE ss.user_id=f.user_id)

This query works, but I'm worried about performance. The result of an explain gives me something like this:

'Insert on development_users (cost=262.01..284.09 rows=138 width=272)'
'  ->  Hash Join  (cost=262.01..284.09 rows=138 width=272)'
'        Hash Cond: ((f.user_id)::text = (cache.user_id)::text)'
'        ->  Foreign Scan on foreign_users f  (cost=100.00..118.28 rows=276 width=272)'
'        ->  Hash  (cost=159.52..159.52 rows=200 width=32)'
'              ->  HashAggregate  (cost=157.52..159.52 rows=200 width=32)'
'                    ->  Foreign Scan on foreign_user_id_subset  (cost=100.00..153.86 rows=1462 width=32)'

What I think is happening is that my development db sends the request to my production db, which creates the temp hash of foreign_user_id_subset (user_id_subset on production) and does the hash check on production. This way, the only thing that gets sent across the wire (between the databases) is the initial request, and then the result of the select query. Is this true?

An alternative idea would be to create a 'temp' (can't be a true TEMP table b/c I need a foreign table) of the result of this request on my production database, and then build a foreign table and just do a SELECT * from development on the foreign table.

(It should be noted that my production database is a much pricier/performant RDS instance than my development database)

1 个答案:

答案 0 :(得分:0)

回答我自己的问题:

我实现了上述替代思想:在生产中生成仅包含EXISTS查询结果的表,然后在开发中创建一个外表来引用该表。然后创建子集表,我只是做了一个'INSERT INTO ... SELECT * FROM ...'。

使用此方法的时间比原始方法快得多。

创建生产表的时间: increment

插入开发数据库的时间:Total runtime: 204.838 ms

要做原始方法: 这是不可接受的慢,采取> 10分钟达到与上面相同的结果(我甚至没有等待解释分析完成)

我不知道更好的方法来确定规划器提供的低级指令,但我会相信原始方法执行顺序扫描(通过重复扫描外表的块)和在开发数据库上执行哈希比较。