Question

我不是PostgreSQL专家，而且我一直在努力解决这个问题。

我有一个相当简单的查询：

SELECT exists(SELECT 1 FROM res_users_log WHERE create_uid=u.id), count(1)
FROM res_users u
WHERE active=true
GROUP BY 1

基本上它计算具有日志条目的活动用户数。两个表都相对较大（每个约600k记录）并且在其ID上有索引。

此查询在我们的服务器上执行约500毫秒，但完全挂起在我的机器上（相同的psql版本，9.3）。我的数据库是服务器转储的恢复，因此索引已在导入时重新编制索引。

当我对查询进行EXPLAIN ANALYZE时，我会在服务器和计算机上得到不同的结果。

我在当地得到

 HashAggregate  (cost=78496.43..88302.28 rows=2 width=4) (actual time=518.003..518.003 rows=1 loops=1)
   ->  Index Scan using res_users_pkey on res_users u  (cost=0.42..78496.35 rows=16 width=4) (actual time=51.393..517.969 rows=11 loops=1)
         Index Cond: (id < 20)
         Filter: active
         Rows Removed by Filter: 7
         SubPlan 1
           ->  Seq Scan on res_users_log  (cost=0.00..9805.83 rows=2 width=0) (actual time=47.078..47.078 rows=1 loops=11)
                 Filter: (create_uid = u.id)
                 Rows Removed by Filter: 516910
 Total runtime: 518.034 ms
(10 rows)

（必须添加id＆lt; 20以使查询实际完成）

在服务器上我得到了

 HashAggregate  (cost=5389666981.78..5389687409.80 rows=2 width=4) (actual time=532.664..532.665 rows=2 loops=1)
   ->  Seq Scan on res_users u  (cost=0.00..5389664343.42 rows=527672 width=4) (actual time=256.169..467.829 rows=527661 loops=1)
         Filter: active
         Rows Removed by Filter: 381
         SubPlan 1
           ->  Seq Scan on res_users_log  (cost=0.00..10214.00 rows=1 width=0) (never executed)
                 Filter: (create_uid = u.id)
         SubPlan 2
           ->  Seq Scan on res_users_log res_users_log_1  (cost=0.00..8800.60 rows=565360 width=4) (actual time=0.006..45.697 rows=547108 loops=1)
 Total runtime: 532.757 ms
(10 rows)

我一直在努力确定为什么查询计划不同（我不理解SubPlan 2条目）以及什么可能使我的笔记本电脑上的查询花费超过2小时（在此之后将其杀死）。

我把两张桌子都弄脏了，没有任何明显的区别。

有什么想法可以让它像这样挂起来？

Answer 1

如果您想要有日志条目的活跃用户，我希望：

SELECT count(*)
FROM res_users u
WHERE u.active = true and
      exists (SELECT 1 FROM res_users_log l WHERE l.create_uid = u.id);

然后res_users(active, id)和res_users_log(create_uid)上的索引将是最佳的。

查询计划＆amp; PERF。另一台机器的变化

1 个答案: