MS SQL:如何将两个字节的nvarchar转换为字符串?

时间:2011-06-07 12:41:32

标签: tsql utf-8 decode cjk utf8-decode

我的nvarchar(4000)字段包含如下数据:

D0B6D181D0B5D0B4D0BA35D0BC (cyrillic string)
E59EA0E78999E79B98E99499 (chinese string)
...

每个字符由数据序列中的两个字节表示。 如何使用T-SQL将此数据转换为字符串?

2 个答案:

答案 0 :(得分:1)

您的第一个示例是UTF8编码的西里尔文本,该文本已转换为十六进制字符串并存储在SQL Server的nvarchar(4000)字段中。这是一个奇怪的组合。奇怪的是,SQL Server没有原生支持在TSQL中将UTF8转换为nvarchar。您可以滚动自己的DecodeUTF8功能,也可以使用下面的。

您的示例:

select
  Cyrillic = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'D0B6D181D0B5D0B4D0BA35D0BC', 1))
  , Chinese = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'E59EA0E78999E79B98E99499', 1))

输出:

Cyrillic  Chinese 
жседк5м   垠牙盘错

我的TSQL的UTF8解码器:

create function [dbo].[DecodeUTF8](@utf8 varchar(max)) returns nvarchar(max)
as
begin
    declare @xml xml;

    with e2(n) as (select top(16) 0 from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n))
    , e3(n) as (select top(256) 0 from e2, e2 e)
    , e4(n) as (select top(65536) 0 from e3, e3 e)
    , e5(n) as (select top(power(2.,31)-1) row_number() over (order by(select 0)) from e4, e4 e)
    , numbers(i) as (select top(datalength(@utf8)) row_number() over (order by(select 0)) from e5)
    , x as (
        select *
        from numbers
        cross apply (select byte = convert(tinyint, convert(binary(1), substring(@utf8, i, 1)))) c
        cross apply (select n = floor(log(~(byte) * 2 + 1, 2)) - 1) d
        cross apply (select bytes = case when n in (5,4,3) then 7 - n else 1 end) e
        cross apply (select data = byte % power(2, n)) f
    )
    select @xml =
    (
        select nchar(case x.bytes
            when 1 then x.data
            when 2 then power(2, 6) * x.data + x2.data 
            when 3 then power(2, 6*2) * x.data + power(2, 6) * x2.data + x3.data
            when 4 then power(2, 6*3) * x.data + power(2, 6*2) * x2.data + power(2, 6) * x3.data + x4.data
          end)
        from x
        left join x x2 on x2.i = x.i + 1 and x.bytes > 1
        left join x x3 on x3.i = x.i + 2 and x.bytes > 2
        left join x x4 on x4.i = x.i + 3 and x.bytes > 3
        where x.n <> 6
        order by x.i
        for xml path('')
    );

    return @xml.value('.', 'nvarchar(max)');
end

答案 1 :(得分:0)

看看这个

http://devio.wordpress.com/2009/07/11/convert-unicode-hex-codepoint-to-unicode-character-in-sql-server/

看起来他们正在做单字节码点,所以你可能需要稍微修改一下