Question

我在linux7上使用postgres 9.5。这是环境：

create table t1(c1 int primary key, c2 varchar(100));

在刚刚创建的表中插入一些行

do $$
begin
for i in 1..12000000 loop
insert into t1 values(i,to_char(i,'9999999'));
end loop;
end $$;

现在我想更新c2列，其中c1 =随机值（EXPLAIN显示未使用索引）。

explain update t1 set c2=to_char(4,'9999999') where c1=cast(floor(random()*100000) as int);
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Update on t1  (cost=10000000000.00..10000000017.20 rows=1 width=10)
   ->  Seq Scan on t1  (cost=10000000000.00..10000000017.20 rows=1 width=10)
         Filter: (c1 = (floor((random() * '100000'::double precision)))::integer)
(3 rows)

现在，如果我将“cast（floor（random（）* 100000）替换为int）”，则使用数字（任意数字）索引：

explain update t1 set c2=to_char(4,'9999999') where c1=12345;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Update on t1  (cost=0.15..8.17 rows=1 width=10)
   ->  Index Scan using t1_pkey on t1  (cost=0.15..8.17 rows=1 width=10)
         Index Cond: (c1 = 12345)
(3 rows)

问题是：

为什么在第一种情况下（当使用random（）时）postgres不使用索引？
如何强制Postgres使用索引？

Answer 1

这是因为 random（）是一个 volatile 函数（参见PostgreSQL CREATE FUNCTION），这意味着每行应该（重新）评估它。

所以你实际上每次都没有更新一个随机行（据我所知你想要的）但是随机行数（行数在哪里）它自己随机生成的数字恰好与其id）匹配，参与概率，它将倾向于0。

使用较低范围查看随机生成的数字：

test=# select * from t1 where c1=cast(floor(random()*10) as int);
 c1 | c2
----+----
(0 rows)

test=# select * from t1 where c1=cast(floor(random()*10) as int);
 c1 |    c2
----+----------
  3 |        3
(1 row)

test=# select * from t1 where c1=cast(floor(random()*10) as int);
 c1 |    c2
----+----------
  4 |        4
  9 |        9
(2 rows)

test=# select * from t1 where c1=cast(floor(random()*10) as int);
 c1 |    c2
----+----------
  5 |        5
  8 |        8
(2 rows)

如果您只想检索一个随机行，首先需要生成单个随机ID以与行ID进行比较。

提示：您可以认为数据库规划器是哑的并且始终对所有行执行顺序扫描，并且每行计算一次条件表达式。然后，在引擎盖下，数据库规划器更加智能，如果他知道每次计算它（在同一事务中）结果将是相同的，然后他计算一次并执行索引扫描。

一个棘手的（但很脏的）解决方案可能是创建自己的 random_stable（）函数，即使它返回一个随机生成的数字，也会将其声明为稳定。

...这将使您的查询像现在一样简单。但我认为这是一个肮脏的解决方案，因为伪造事实上，该函数实际上是 volatile 。

然后，一个更好的解决方案（对我来说是正确的）是以一种真正生成单次数的形式编写查询。

例如：

test=# with foo as (select floor(random()*1000000)::int as bar) select * from t1 join foo on (t1.c1 = foo.bar);
 c1  |    c2    | bar
-----+----------+-----
 929 |      929 | 929
(1 row)

...或像这样的子查询解决方案提供@a_horse_with_no_name

注意：我使用选择查询而非更新查询，以简化和提高可读性，但案例是相同的：只需使用相同的 where 子句（使用子查询方法：当然，使用哪个会更棘手......）。然后，要检查是否使用了索引，您只需要预先添加＆＃34;解释＆＃34;如你所知。

Answer 2

不确定为什么索引没有被使用，可能是因为random（）函数的定义。如果您使用子选择来调用该函数，那么（至少对于9.5.3我来说）Postgres使用索引：

explain 
update t1 
   set c2=to_char(4,'9999999') 
where c1= (select cast(floor(random()*100000) as int));

返回：

Update on t1  (cost=0.44..3.45 rows=1 width=10)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=0)
  ->  Index Scan using t1_pkey on t1  (cost=0.43..3.44 rows=1 width=10)
        Index Cond: (c1 = $0)

Postgres不使用索引

2 个答案: