postgresql函数混淆

时间:2012-03-13 19:46:20

标签: sql function postgresql

如果我这样写一个查询:

with WordBreakDown (idx, word, wordlength) as (
    select 
        row_number() over () as idx,
        word,
        character_length(word) as wordlength
    from
    unnest(string_to_array('yo momma so fat', ' ')) as word
)
select 
    cast(wbd.idx + (
        select SUM(wbd2.wordlength)
        from WordBreakDown wbd2
        where wbd2.idx <= wbd.idx
        ) - wbd.wordlength as integer) as position,
    cast(wbd.word as character varying(512)) as part
from
    WordBreakDown wbd;  

...我得到一个包含4行的表:

1;"yo"
4;"momma"
10;"so"
13;"fat"

......这就是我想要的。 HOWEVER ,如果我把它包装成这样的函数:

drop type if exists split_result cascade;
create type split_result as(
    position integer,
    part character varying(512)
);

drop function if exists split(character varying(512), character(1));    
create function split(
    _s character varying(512), 
    _sep character(1)
    ) returns setof split_result as $$
begin

    return query
    with WordBreakDown (idx, word, wordlength) as (
        select 
            row_number() over () as idx,
            word,
            character_length(word) as wordlength
        from
        unnest(string_to_array(_s, _sep)) as word
    )
    select 
        cast(wbd.idx + (
            select SUM(wbd2.wordlength)
            from WordBreakDown wbd2
            where wbd2.idx <= wbd.idx
            ) - wbd.wordlength as integer) as position,
        cast(wbd.word as character varying(512)) as part
    from
        WordBreakDown wbd;  

end;
$$ language plpgsql;

select * from split('yo momma so fat', ' ');

......我明白了:

1;"yo momma so fat"

我正在摸不着头脑。我搞砸了什么?

更新 根据以下建议,我已经替换了这个功能:

CREATE OR REPLACE FUNCTION split(_string character varying(512), _sep character(1))
  RETURNS TABLE (postition int, part character varying(512)) AS
$BODY$
BEGIN
    RETURN QUERY
    WITH wbd AS (
        SELECT (row_number() OVER ())::int AS idx
              ,word
              ,length(word) AS wordlength
        FROM   unnest(string_to_array(_string, rpad(_sep, 1))) AS word
        )
    SELECT (sum(wordlength) OVER (ORDER BY idx))::int + idx - wordlength
          ,word::character varying(512) -- AS part
    FROM wbd;  
END;
$BODY$ LANGUAGE plpgsql;

...保留了我原来的功能签名,以实现最大的兼容性,以及最大的性能提升份额。感谢回答者,我发现这是一个多方面的学习经历。你的解释确实帮助我理解了发生了什么。

3 个答案:

答案 0 :(得分:1)

你有几个结构可能没有按你认为的那样做。

这是你的功能的一个很大程度上简化的版本,也快得多:

CREATE OR REPLACE FUNCTION split(_string text, _sep text)
  RETURNS TABLE (postition int, part text) AS
$BODY$
BEGIN
    RETURN QUERY
    WITH wbd AS (
        SELECT (row_number() OVER ())::int AS idx
              ,word
              ,length(word) AS wordlength
        FROM   unnest(string_to_array(_string, _sep)) AS word
        )
    SELECT (sum(wordlength) OVER (ORDER BY idx))::int + idx - wordlength
          ,word -- AS part
    FROM wbd;  
END;
$BODY$ LANGUAGE plpgsql;

解释

  • 使用另一个窗口函数来总结字长。更快,更简单,更清洁。这使得大部分性能提升。很多子查询会让你失望。

  • 使用data type text代替character varying甚至character()character varyingcharacter是非常糟糕的类型,主要是为了兼容SQL标准和历史原因。对于text无法做到的事情,几乎无法做任何事情。与此同时,@ Tometzky解释了为什么 character(1)对参数类型来说是一个特别糟糕的选择。我通过使用text来解决这个问题。

  • 正如@Tometzky所说,unnest(string_to_array(..))regexp_split_to_table(..)更快 - 即使我们在这里使用的小字符串只有一点点(最多512个字符)。所以我换回原来的表达方式。

  • length()character_length()的作用相同。

  • 在只有一个表源的查询中(并且没有其他可能的命名冲突),您可能也不会对列名进行表限定。简化代码。

  • 我们最后需要一个整数值,因此我将所有数值(在这种情况下为bigint)立即转换为整数,因此通过整数运算进行加法和减法,这通常是最快的。
    'value'::int只是cast('value' as integer)的更短语法,而且等效。

答案 1 :(得分:1)

观察:

select length(' '::character(1));
 length
--------
      0
(1 row)

这种混淆的原因是SQL标准中character类型的奇怪定义。来自Postgres documentation for character types

类型字符的值用空格填充到指定的宽度n,并以这种方式存储和显示。但是,填充空间被视为语义上无关紧要。比较两个类型字符值时,忽略尾随空格,将字符值转换为其他字符串类型时,将被删除。

所以你应该使用string_to_array(_s, rpad(_sep,1))

答案 2 :(得分:0)

我找到了答案,但我不明白。

string_to_array(_s, _sep)函数不会以不变的字符分割;即使我这样写它也行不通:

string_to_array(_s, cast(_sep as character_varying(1)))

但是如果我重新定义了这些参数:

drop function if exists split(character varying(512), character(1));    
create function split(
    _s character varying(512), 
    _sep character varying(1)

......突然间它按照我的预期运作。不知道该怎么做,真的不是我想要的答案......现在我改变了功能的签名,这不是我想做的。