Question

我目前正在使用此方法在PostgreSQL中按字母顺序对字符串中的字母进行排序。还有其他有效的方法吗？

select string_agg(c, '') as s
from   (select unnest(regexp_split_to_array('ijsAafhareDbv', '')) as c 
        order  by c) as t; 

       s   
 --------------
 ADaabefhijrsv

Answer 1

我创建了3个函数，一个使用我的查询，另一个使用Laurenz的查询和另外一个：我创建了一个Python（plpythonu）函数进行排序。最后，我创建了一个包含100000行的表（我现在从Mac笔记本电脑上做过）每个包含使用此Link

中的random_string函数生成的随机15个字符的字符串

create table t as select random_string(15) as s FROM generate_series(1,100000);

以下是3个功能。

CREATE or REPLACE FUNCTION sort1(x TEXT) RETURNS TEXT AS $$
select string_agg(c, '') as s
from   (select unnest(regexp_split_to_array($1, '')) as c 
        order  by c) as t;
$$ LANGUAGE SQL IMMUTABLE;


CREATE or REPLACE FUNCTION sort2(x TEXT) RETURNS TEXT AS $$
WITH t(s) AS (VALUES ($1))
SELECT string_agg(substr(t.s, g.g, 1), ''
                  ORDER BY substr(t.s, g.g, 1)
                 )
FROM t
   CROSS JOIN LATERAL generate_series(1, length(t.s)) g;

$$ LANGUAGE SQL IMMUTABLE;


create language plpythonu;
CREATE or REPLACE FUNCTION pysort(x text)
  RETURNS text
AS $$
  return ''.join(sorted(x))
$$ LANGUAGE plpythonu IMMUTABLE;

这是EXPLAIN ANALYSE对所有三个人的结果。

knayak=# EXPLAIN ANALYSE select sort1(s)  FROM t;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.266..7097.740 rows=100000 loops=1)
 Planning time: 0.119 ms
 Execution time: 7106.871 ms
(3 rows)

knayak=# EXPLAIN ANALYSE select sort2(s)  FROM t;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.418..7012.935 rows=100000 loops=1)
 Planning time: 0.270 ms
 Execution time: 7021.587 ms
(3 rows)

knayak=# EXPLAIN ANALYSE select pysort(s) FROM t;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.060..389.729 rows=100000 loops=1)
 Planning time: 0.048 ms
 Execution time: 395.760 ms
(3 rows)

从这个分析中可以看出 - Python排序是最快的，前两个之间没有太大的差异2.但是需要在我们的系统中实时检查大型表的性能。 / p>

Answer 2

如果您想要一个没有正则表达式的解决方案，可以使用：

WITH t(s) AS (VALUES ('amfjwzeils'))
SELECT string_agg(substr(t.s, g.g, 1), ''
                  ORDER BY substr(t.s, g.g, 1)
                 )
FROM t
   CROSS JOIN LATERAL generate_series(1, length(t.s)) g;

 string_agg 
------------
 aefijlmswz
(1 row)

我会测试哪种解决方案更快。

Answer 3

C中实现的功能基本比我们使用LANGUAGE sql或plpgsql所能实现的任何功能更快。所以your plpythonu function远远地赢得了表演比赛。

但plpythonu是一种 不受信任的 程序语言。默认情况下不安装，只有超级用户才能以不受信任的语言创建功能。您需要了解安全隐患。大多数云服务都不提供不受信任的语言 The current manual (quote from pg 10):

PL / Python只能作为“不受信任”的语言使用，这意味着它不提供任何限制用户可以做的事情的方式因此命名为plpythonu。受信任的变体plpython可能会成为如果开发了安全执行机制，将来可用在Python中。不受信任的PL / Python函数的编写者必须参与关心该功能不能用于做任何不需要的事情，因为它将能够执行登录用户可以执行的任何操作作为数据库管理员。只有超级用户才能创建功能不受信任的语言，例如plpythonu。

您测试的SQL函数未得到很好的优化。有一千零一种提高性能的方法，但是：

演示

-- func to create random strings CREATE OR REPLACE FUNCTION f_random_string(int) RETURNS text AS $func$ SELECT array_to_string(ARRAY( SELECT substr('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', (ceil(random()*62))::int, 1) FROM generate_series(1, $1) ), '') $func$ LANGUAGE sql VOLATILE; -- test tbl with 100K rows CREATE TABLE tbl(str text); INSERT INTO tbl SELECT f_random_string(15) FROM generate_series(1, 100000) g;

VACUUM ANALYZE tbl;

-- 1: your test function 1 (inefficient) CREATE OR REPLACE FUNCTION sort1(text) RETURNS text AS $func$ -- your test function 1 (very inefficient) SELECT string_agg(c, '') FROM (SELECT unnest(regexp_split_to_array($1, '')) AS c ORDER BY c) t; $func$ LANGUAGE sql IMMUTABLE; -- 2: your test function 2 ( inefficient) CREATE OR REPLACE FUNCTION sort2(text) RETURNS text AS $func$ WITH t(s) AS (VALUES ($1)) SELECT string_agg(substr(t.s, g.g, 1), '' ORDER BY substr(t.s, g.g, 1)) FROM t CROSS JOIN LATERAL generate_series(1, length(t.s)) g; $func$ LANGUAGE sql IMMUTABLE; -- 3: remove pointless CTE from sort2 CREATE OR REPLACE FUNCTION sort3(text) RETURNS text AS $func$ SELECT string_agg(substr($1, g, 1), '' ORDER BY substr($1, g, 1)) FROM generate_series(1, length($1)) g; $func$ LANGUAGE sql IMMUTABLE; -- 4: use unnest instead of calling substr N times CREATE OR REPLACE FUNCTION sort4(text) RETURNS text AS $func$ SELECT string_agg(c, '' ORDER BY c) FROM unnest(string_to_array($1, NULL)) c $func$ LANGUAGE sql IMMUTABLE; -- 5: ORDER BY in subquery CREATE OR REPLACE FUNCTION sort5(text) RETURNS text AS $func$ SELECT string_agg(c, '') FROM ( SELECT c FROM unnest(string_to_array($1, NULL)) c ORDER BY c ) sub $func$ LANGUAGE sql IMMUTABLE; -- 6: SRF in SELECT list CREATE OR REPLACE FUNCTION sort6(text) RETURNS text AS $func$ SELECT string_agg(c, '') FROM (SELECT unnest(string_to_array($1, NULL)) c ORDER BY 1) sub $func$ LANGUAGE sql IMMUTABLE; -- 7: ARRAY constructor instead of aggregate func CREATE OR REPLACE FUNCTION sort7(text) RETURNS text AS $func$ SELECT array_to_string(ARRAY(SELECT unnest(string_to_array($1, NULL)) c ORDER BY c), '') $func$ LANGUAGE sql IMMUTABLE; -- 8: The same with COLLATE "C" CREATE OR REPLACE FUNCTION sort8(text) RETURNS text AS $func$ SELECT array_to_string(ARRAY(SELECT unnest(string_to_array($1 COLLATE "C", NULL)) c ORDER BY c), '') $func$ LANGUAGE sql IMMUTABLE;

SELECT str, sort1(str), sort2(str), sort3(str), sort4(str), sort5(str), sort6(str), sort7(str), sort8(str) FROM tbl LIMIT 1; -- result sample

str | sort1 | sort2 | sort3 | sort4 | sort5 | sort6 | sort7 | sort8 :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- tUkmori4D1rHhI1 | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DHIUhikmorrt

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort1(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.053 ms | | Execution time: 2742.904 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort2(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.105 ms | | Execution time: 2579.397 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort3(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.079 ms | | Execution time: 2191.228 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort4(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.075 ms | | Execution time: 2194.780 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort5(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.083 ms | | Execution time: 1902.829 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort6(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.075 ms | | Execution time: 1866.407 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort7(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.067 ms | | Execution time: 1863.713 ms |

EXPLAIN (ANALYZE, TIMING OFF) SELECT sort8(str) FROM tbl;

| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.074 ms | | Execution time: 1569.376 ms |

db＆lt;＆gt;小提琴here

最后一个排序没有COLLATION规则，严格按字符的字节值排序，这要便宜得多。但是您可能会或可能不会需要不同语言环境的排序顺序。

The manual about COLLATION expressions.

在PostgreSQL中按字母顺序对字符串中的字母进行排序

3 个答案:

演示