Amazon Redshift中的十六进制字符串到整数转换

时间:2014-01-02 17:43:58

标签: sql postgresql amazon-web-services hex amazon-redshift

Amazon Redshift基于Pargccel,它基于Postgres。根据我的研究,似乎在Postgres中执行十六进制字符串到整数转换的首选方法是通过位字段,如answer中所述。

对于bigint,这将是:

select ('x'||lpad('123456789abcdef',16,'0'))::bit(64)::bigint

不幸的是,这在Redshift上失败了:

ERROR: cannot cast type text to bit [SQL State=42846] 

在Postgres 8.1ish中有哪些其他方式可以执行此转换(接近Redshift兼容级别)? Redshift不支持UDF,也不支持数组,正则表达式函数或集生成函数......

3 个答案:

答案 0 :(得分:6)

看起来他们在某些时候添加了一个函数:STRTOL

  

语法

     

STRTOL(num_string,base)

     

返回类型

     

BIGINT。如果num_string为null,则返回NULL。

例如

SELECT strtol('deadbeef', 16);

返回:3735928559

答案 1 :(得分:3)

假设你想要一个简单的逐位顺序位置转换(即你不担心两个恭维否定等),我认为这应该适用于8.1等效的数据库:

CREATE OR REPLACE FUNCTION hex2dec(text) RETURNS bigint AS $$
SELECT sum(CASE WHEN v >= ascii('a') THEN v - ascii('a') + 10 ELSE v - ascii('0') END * 16^ordpos)::bigint
FROM (
    SELECT n-1, ascii(substring(reverse($1), n, 1))
    FROM generate_series(1, length($1)) n
) AS x(ordpos, v);
$$ LANGUAGE sql IMMUTABLE;

函数表单是可选的,它可以更容易避免重复参数多次。无论如何它应该被内联。效率可能会非常糟糕,但大多数可用于实现这一目标的工具似乎不适用于旧版本,这至少可行:

regress=> CREATE TABLE t AS VALUES ('c13b'), ('a'), ('f');
regress=> SELECT hex2dec(column1) FROM t;
 hex2dec 
---------
   49467
      10
      15
(3 rows)

如果您可以使用regexp_split_to_arraygenerate_subscripts,那可能会更快。或者更慢。我没试过。另一个可能的技巧是使用数字映射数组而不是CASE,例如:

'[48:102]={0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,11,12,13,14,15}'::integer[]

您可以使用:

CREATE OR REPLACE FUNCTION hex2dec(text) RETURNS bigint AS $$
SELECT sum(
  ('[48:102]={0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,11,12,13,14,15}'::integer[])[ v ]
  * 16^ordpos
)::bigint
FROM (
    SELECT n-1, ascii(substring(reverse($1), n, 1))
    FROM generate_series(1, length($1)) n
) AS x(ordpos, v);
$$ LANGUAGE sql IMMUTABLE;

就个人而言,我会做客户端,而不是纠缠旧PostgreSQL分支的有限功能,特别是你无法加载自己明智的用户定义的C函数,或者使用PL / Perl,等


在真正的PostgreSQL中,我只是使用它:

<强> hex2dec.c

#include "postgres.h"
#include "fmgr.h"
#include "utils/builtins.h"
#include "errno.h"
#include "limits.h"
#include <stdlib.h>

PG_MODULE_MAGIC;

Datum from_hex(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(hex2dec);

Datum
hex2dec(PG_FUNCTION_ARGS)
{
    char *endpos;
    const char *hexstr = text_to_cstring(PG_GETARG_TEXT_PP(0));
    long decval = strtol(hexstr, &endpos, 16);
    if (endpos[0] != '\0')
    {
        ereport(ERROR, (ERRCODE_INVALID_PARAMETER_VALUE, errmsg("Could not decode input string %s as hex", hexstr)));
    }
    if (decval == LONG_MAX && errno == ERANGE)
    {
        ereport(ERROR, (ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, errmsg("Input hex string %s overflows int64", hexstr)));
    }
    PG_RETURN_INT64(decval);
}

<强>生成文件

MODULES = hex2dec
DATA = hex2dec--1.0.sql
EXTENSION = hex2dec

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

<强> hex2dec.control

comment = 'Utility function to convert hex strings to decimal'
default_version = '1.0'
module_pathname = '$libdir/hex2dec'
relocatable = true

<强> HEX2DEC - 1.0.sql

CREATE OR REPLACE FUNCTION hex2dec(hexstr text) RETURNS bigint
        AS 'hex2dec','hex2dec'
        LANGUAGE c IMMUTABLE STRICT;

COMMENT ON FUNCTION hex2dec(hexstr text)
IS 'Decode the hex string passed, which may optionally have a leading 0x, as a bigint. Does not attempt to consider negative hex values.';

用法:

CREATE EXTENSION hex2dec;

postgres=# SELECT hex2dec('7fffffffffffffff');
       hex2dec       
---------------------
 9223372036854775807
(1 row)

postgres=# SELECT hex2dec('deadbeef');
  hex2dec   
------------
 3735928559
(1 row)

postgres=# SELECT hex2dec('12345');
 hex2dec 
---------
   74565
(1 row)

postgres=# select hex2dec(to_hex(-1));
  hex2dec   
------------
 4294967295
(1 row)

postgres=# SELECT hex2dec('8fffffffffffffff');
ERROR:  Input hex string 8fffffffffffffff overflows int64

postgres=# SELECT hex2dec('0x7abcz123');
ERROR:  Could not decode input string 0x7abcz123 as hex

性能差异......值得注意。给出样本数据:

CREATE TABLE randhex AS 
SELECT '0x'||to_hex( abs(random() * (10^((random()-.5)*10)) * 10000000)::bigint) AS h
FROM generate_series(1,1000000);

从使用C扩展的热缓存从十六进制到十进制的转换大约需要1.3,这对于一百万行来说并不是很好。无需任何转换即可读取它们需要0.95秒。基于SQL的hex2dec方法花了36秒来处理相同的行。坦率地说,我对SQL方法的速度一样快感到印象深刻,并且惊讶于C ext很慢。

答案 2 :(得分:1)

可能的解释是,从textbit(n)的演员表依赖于无证件的行为,我重复quote from Tom Lane

  

这依赖于位类型输入的一些未记录的行为   转换器,但我认为没有理由期望会破坏。可能   更大的问题是它需要PG&gt; = 8.3,因为没有文本   在那之前投点。

亚马逊衍生品显然不允许这种无证件的功能。这并不奇怪,因为它基于Postgres 8.1,根本没有演员阵容。

此前引用的密切相关的答案:
Convert hex in text representation to decimal number