我目前正在使用此方法在PostgreSQL中按字母顺序对字符串中的字母进行排序。还有其他有效的方法吗?
select string_agg(c, '') as s
from (select unnest(regexp_split_to_array('ijsAafhareDbv', '')) as c
order by c) as t;
s
--------------
ADaabefhijrsv
答案 0 :(得分:3)
我创建了3个函数,一个使用我的查询,另一个使用Laurenz的查询和另外一个:我创建了一个Python(plpythonu)函数进行排序。最后,我创建了一个包含100000行的表(我现在从Mac笔记本电脑上做过) 每个包含使用此Link
中的random_string
函数生成的随机15个字符的字符串
create table t as select random_string(15) as s FROM generate_series(1,100000);
以下是3个功能。
CREATE or REPLACE FUNCTION sort1(x TEXT) RETURNS TEXT AS $$
select string_agg(c, '') as s
from (select unnest(regexp_split_to_array($1, '')) as c
order by c) as t;
$$ LANGUAGE SQL IMMUTABLE;
CREATE or REPLACE FUNCTION sort2(x TEXT) RETURNS TEXT AS $$
WITH t(s) AS (VALUES ($1))
SELECT string_agg(substr(t.s, g.g, 1), ''
ORDER BY substr(t.s, g.g, 1)
)
FROM t
CROSS JOIN LATERAL generate_series(1, length(t.s)) g;
$$ LANGUAGE SQL IMMUTABLE;
create language plpythonu;
CREATE or REPLACE FUNCTION pysort(x text)
RETURNS text
AS $$
return ''.join(sorted(x))
$$ LANGUAGE plpythonu IMMUTABLE;
这是EXPLAIN ANALYSE
对所有三个人的结果。
knayak=# EXPLAIN ANALYSE select sort1(s) FROM t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.266..7097.740 rows=100000 loops=1)
Planning time: 0.119 ms
Execution time: 7106.871 ms
(3 rows)
knayak=# EXPLAIN ANALYSE select sort2(s) FROM t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.418..7012.935 rows=100000 loops=1)
Planning time: 0.270 ms
Execution time: 7021.587 ms
(3 rows)
knayak=# EXPLAIN ANALYSE select pysort(s) FROM t;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.060..389.729 rows=100000 loops=1)
Planning time: 0.048 ms
Execution time: 395.760 ms
(3 rows)
从这个分析中可以看出 - Python排序是最快的,前两个之间没有太大的差异2.但是需要在我们的系统中实时检查大型表的性能。 / p>
答案 1 :(得分:2)
如果您想要一个没有正则表达式的解决方案,可以使用:
WITH t(s) AS (VALUES ('amfjwzeils'))
SELECT string_agg(substr(t.s, g.g, 1), ''
ORDER BY substr(t.s, g.g, 1)
)
FROM t
CROSS JOIN LATERAL generate_series(1, length(t.s)) g;
string_agg
------------
aefijlmswz
(1 row)
我会测试哪种解决方案更快。
答案 2 :(得分:1)
C
中实现的功能基本比我们使用LANGUAGE sql
或plpgsql
所能实现的任何功能更快。所以your plpythonu
function远远地赢得了表演比赛。
但plpythonu
是一种 不受信任的 程序语言。默认情况下不安装,只有超级用户才能以不受信任的语言创建功能。您需要了解安全隐患。大多数云服务都不提供不受信任的语言
The current manual (quote from pg 10):
PL / Python只能作为“不受信任”的语言使用,这意味着它 不提供任何限制用户可以做的事情的方式 因此命名为
plpythonu
。受信任的变体plpython
可能会成为 如果开发了安全执行机制,将来可用 在Python中。不受信任的PL / Python函数的编写者必须参与 关心该功能不能用于做任何不需要的事情,因为 它将能够执行登录用户可以执行的任何操作 作为数据库管理员。只有超级用户才能创建功能 不受信任的语言,例如plpythonu
。
您测试的SQL函数未得到很好的优化。有一千零一种提高性能的方法,但是:
-- func to create random strings CREATE OR REPLACE FUNCTION f_random_string(int) RETURNS text AS $func$ SELECT array_to_string(ARRAY( SELECT substr('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', (ceil(random()*62))::int, 1) FROM generate_series(1, $1) ), '') $func$ LANGUAGE sql VOLATILE; -- test tbl with 100K rows CREATE TABLE tbl(str text); INSERT INTO tbl SELECT f_random_string(15) FROM generate_series(1, 100000) g;
VACUUM ANALYZE tbl;
-- 1: your test function 1 (inefficient) CREATE OR REPLACE FUNCTION sort1(text) RETURNS text AS $func$ -- your test function 1 (very inefficient) SELECT string_agg(c, '') FROM (SELECT unnest(regexp_split_to_array($1, '')) AS c ORDER BY c) t; $func$ LANGUAGE sql IMMUTABLE; -- 2: your test function 2 ( inefficient) CREATE OR REPLACE FUNCTION sort2(text) RETURNS text AS $func$ WITH t(s) AS (VALUES ($1)) SELECT string_agg(substr(t.s, g.g, 1), '' ORDER BY substr(t.s, g.g, 1)) FROM t CROSS JOIN LATERAL generate_series(1, length(t.s)) g; $func$ LANGUAGE sql IMMUTABLE; -- 3: remove pointless CTE from sort2 CREATE OR REPLACE FUNCTION sort3(text) RETURNS text AS $func$ SELECT string_agg(substr($1, g, 1), '' ORDER BY substr($1, g, 1)) FROM generate_series(1, length($1)) g; $func$ LANGUAGE sql IMMUTABLE; -- 4: use unnest instead of calling substr N times CREATE OR REPLACE FUNCTION sort4(text) RETURNS text AS $func$ SELECT string_agg(c, '' ORDER BY c) FROM unnest(string_to_array($1, NULL)) c $func$ LANGUAGE sql IMMUTABLE; -- 5: ORDER BY in subquery CREATE OR REPLACE FUNCTION sort5(text) RETURNS text AS $func$ SELECT string_agg(c, '') FROM ( SELECT c FROM unnest(string_to_array($1, NULL)) c ORDER BY c ) sub $func$ LANGUAGE sql IMMUTABLE; -- 6: SRF in SELECT list CREATE OR REPLACE FUNCTION sort6(text) RETURNS text AS $func$ SELECT string_agg(c, '') FROM (SELECT unnest(string_to_array($1, NULL)) c ORDER BY 1) sub $func$ LANGUAGE sql IMMUTABLE; -- 7: ARRAY constructor instead of aggregate func CREATE OR REPLACE FUNCTION sort7(text) RETURNS text AS $func$ SELECT array_to_string(ARRAY(SELECT unnest(string_to_array($1, NULL)) c ORDER BY c), '') $func$ LANGUAGE sql IMMUTABLE; -- 8: The same with COLLATE "C" CREATE OR REPLACE FUNCTION sort8(text) RETURNS text AS $func$ SELECT array_to_string(ARRAY(SELECT unnest(string_to_array($1 COLLATE "C", NULL)) c ORDER BY c), '') $func$ LANGUAGE sql IMMUTABLE;
SELECT str, sort1(str), sort2(str), sort3(str), sort4(str), sort5(str), sort6(str), sort7(str), sort8(str) FROM tbl LIMIT 1; -- result sample
str | sort1 | sort2 | sort3 | sort4 | sort5 | sort6 | sort7 | sort8 :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- tUkmori4D1rHhI1 | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DHIUhikmorrt
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort1(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.053 ms | | Execution time: 2742.904 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort2(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.105 ms | | Execution time: 2579.397 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort3(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.079 ms | | Execution time: 2191.228 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort4(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.075 ms | | Execution time: 2194.780 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort5(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.083 ms | | Execution time: 1902.829 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort6(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.075 ms | | Execution time: 1866.407 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort7(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.067 ms | | Execution time: 1863.713 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort8(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.074 ms | | Execution time: 1569.376 ms |
db<>小提琴here
最后一个排序没有COLLATION
规则,严格按字符的字节值排序,这要便宜得多。但是您可能会或可能不会需要不同语言环境的排序顺序。