我的nvarchar(4000)字段包含如下数据:
D0B6D181D0B5D0B4D0BA35D0BC (cyrillic string)
E59EA0E78999E79B98E99499 (chinese string)
...
每个字符由数据序列中的两个字节表示。 如何使用T-SQL将此数据转换为字符串?
答案 0 :(得分:1)
您的第一个示例是UTF8编码的西里尔文本,该文本已转换为十六进制字符串并存储在SQL Server的nvarchar(4000)字段中。这是一个奇怪的组合。奇怪的是,SQL Server没有原生支持在TSQL中将UTF8转换为nvarchar。您可以滚动自己的DecodeUTF8功能,也可以使用下面的。
您的示例:
select
Cyrillic = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'D0B6D181D0B5D0B4D0BA35D0BC', 1))
, Chinese = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'E59EA0E78999E79B98E99499', 1))
输出:
Cyrillic Chinese
жседк5м 垠牙盘错
我的TSQL的UTF8解码器:
create function [dbo].[DecodeUTF8](@utf8 varchar(max)) returns nvarchar(max)
as
begin
declare @xml xml;
with e2(n) as (select top(16) 0 from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n))
, e3(n) as (select top(256) 0 from e2, e2 e)
, e4(n) as (select top(65536) 0 from e3, e3 e)
, e5(n) as (select top(power(2.,31)-1) row_number() over (order by(select 0)) from e4, e4 e)
, numbers(i) as (select top(datalength(@utf8)) row_number() over (order by(select 0)) from e5)
, x as (
select *
from numbers
cross apply (select byte = convert(tinyint, convert(binary(1), substring(@utf8, i, 1)))) c
cross apply (select n = floor(log(~(byte) * 2 + 1, 2)) - 1) d
cross apply (select bytes = case when n in (5,4,3) then 7 - n else 1 end) e
cross apply (select data = byte % power(2, n)) f
)
select @xml =
(
select nchar(case x.bytes
when 1 then x.data
when 2 then power(2, 6) * x.data + x2.data
when 3 then power(2, 6*2) * x.data + power(2, 6) * x2.data + x3.data
when 4 then power(2, 6*3) * x.data + power(2, 6*2) * x2.data + power(2, 6) * x3.data + x4.data
end)
from x
left join x x2 on x2.i = x.i + 1 and x.bytes > 1
left join x x3 on x3.i = x.i + 2 and x.bytes > 2
left join x x4 on x4.i = x.i + 3 and x.bytes > 3
where x.n <> 6
order by x.i
for xml path('')
);
return @xml.value('.', 'nvarchar(max)');
end
答案 1 :(得分:0)
看看这个
看起来他们正在做单字节码点,所以你可能需要稍微修改一下