Question

此查询需要花费很多时间才能完成。 10k记录对我来说有点意外。有没有更有效的方法来运行基于列的dupes计算的查询？

UPDATE exportable_businesses e1 SET phone_dupe = 
 (CASE WHEN
   (SELECT COUNT(sidewalk_business_id) FROM exportable_businesses e2 WHERE query_id = #{id} AND e1.phone_number=e2.phone_number) > 1 
       THEN 'x' ELSE NULL END)

Answer 1

首先尝试计算电话号码，例如：

create temporary table phone_cnt as 
   select phone_number, count(*) as c from exportable_businesses 
   where query_id = #{id} 
   group by phone_number

然后使用预先计算的值来设置phone_dupe变量。 Postgres应该能够通过连接进行更新，例如：

update exportable_businesses e1 
   set phone_dupe = (case when pc.c ...)
   from phone_cnt pc 
   where pc.phone_number = e1.phone_number

如果这仍然很慢，则需要在执行更新查询之前在phone_cnt (phone_number)上创建显式索引。这样整个计算将采用线性时间，而不是二次方式，就像计算子查询的示例中的情况一样。

您可以在查询后删除临时phone_cnt表。

有没有办法加快这种欺骗检测查询？

1 个答案: