如何比较单字节和多字节字符串?

时间:2018-05-31 08:31:23

标签: oracle plsql

我有两个字符串,一个是来自客户端的输入,另一个是表中的数据。两个字符串似乎相同,但在尝试CAST_TO_RAW时具有不同的十六进制值。

SELECT UTL_RAW.CAST_TO_VARCHAR2('3539352F47502D0A41544258484E') INPUT,
       UTL_RAW.CAST_TO_NVARCHAR2('003500390035002F00470050002D000D000A00410054004200580048004E') DTA
  FROM DUAL;
/*input and data seem same*/

考虑两个字符串是相同的。我如何通过这种情况并在查询中比较它们:

SELECT A.DATA,
       A.ORTHER_COL
  FROM MYTABLE A
 WHERE A.DATA = INPUT;

我尝试了TO_SINGLE_BYTE,但它不起作用(因为它的LENGTHB不同):

SELECT *
  FROM DUAL
 WHERE TO_SINGLE_BYTE(UTL_RAW.CAST_TO_VARCHAR2('3539352F47502D0A41544258484E')) =
       TO_SINGLE_BYTE(UTL_RAW.CAST_TO_NVARCHAR2('003500390035002F00470050002D000D000A00410054004200580048004E'));
/*return null*/

1 个答案:

答案 0 :(得分:4)

这两个字符串不一样;第二个在中间有一个额外的000D

'3539352F47502D0A41544258484E'
             ^^^^
'003500390035002F00470050002D000D000A00410054004200580048004E'
                           ^^  ^^  ^^

如果它们实际上是相同的,您可以将它们与隐式转换进行比较(将0D添加到第一个字符串,但您可能更愿意将其从第二个字符串中删除):

SELECT *
  FROM DUAL
 WHERE UTL_RAW.CAST_TO_VARCHAR2('3539352F47502D0D0A41544258484E') =
       UTL_RAW.CAST_TO_NVARCHAR2('003500390035002F00470050002D000D000A00410054004200580048004E');

D
-
X

或明确指向nvarchar2

SELECT *
  FROM DUAL
 WHERE cast(UTL_RAW.CAST_TO_VARCHAR2('3539352F47502D0D0A41544258484E') as nvarchar2(2000)) =
       UTL_RAW.CAST_TO_NVARCHAR2('003500390035002F00470050002D000D000A00410054004200580048004E');

D
-
X

或其他方式:

SELECT *
  FROM DUAL
 WHERE UTL_RAW.CAST_TO_VARCHAR2('3539352F47502D0D0A41544258484E') =
       cast(UTL_RAW.CAST_TO_NVARCHAR2('003500390035002F00470050002D000D000A00410054004200580048004E') as varchar2(4000));

D
-
X

根据原始数据和两个DB字符集的不同,您可能会看到一些奇怪的东西。

从Oracle 12c中你可以使用a UCA linguistic collation ignores LF和CRLF之间的区别,例如:

alter session set nls_sort = 'UCA0700_ORADUCET_S1';
alter session set nls_comp = 'LINGUISTIC';

SELECT *
  FROM DUAL
 WHERE UTL_RAW.CAST_TO_VARCHAR2('3539352F47502D0A41544258484E') =
       UTL_RAW.CAST_TO_NVARCHAR2('003500390035002F00470050002D000D000A00410054004200580048004E');

D
-
X

你需要看看它对性能有什么影响,以及其他可忽略的字符是否会导致错误匹配。如果您想要忽略LF / CRLF之间的差异,那么您可能仍然难以对数据进行消毒以保持一致。