Question

将包含转义组合变音符号（如“Fu \ u0308rst”）的Delphi XE AnsiString转换为frienly WideString“Fürst”的最佳方法是什么？

我知道这并不总是适用于所有组合，但应该支持常见的Latin块，而不是自己构建愚蠢的转换表。我想解决方案可以在新的Characters单元中找到，但是我没有得到它。

Answer 1

我认为您需要在字符串上执行Unicode Normalization.。

我不知道Delphi XE RTL中是否有特定的调用来执行此操作，但WinAPI调用NormalizeString应该可以帮助您，使用模式NormalizationKC：

NormalizationKC

Unicode规范化形式KC，兼容性组合。变换   每个基地加上组合字符   规范的预组合等价物   和所有兼容性字符   他们的等价物例如，结扎fi变为f + i;类似地，A +¨+ fi + n变为Ä+ f + i + n。

Answer 2

以下是解决我问题的完整代码：

function Unescape(const s: AnsiString): string;
var
  i: Integer;
  j: Integer;
  c: Integer;
begin
  // Make result at least large enough. This prevents too many reallocs
  SetLength(Result, Length(s));
  i := 1;
  j := 1;
  while i <= Length(s) do begin
    if s[i] = '\' then begin
      if i < Length(s) then begin
        // escaped backslash?
        if s[i + 1] = '\' then begin
          Result[j] := '\';
          inc(i, 2);
        end
        // convert hex number to WideChar
        else if (s[i + 1] = 'u') and (i + 1 + 4 <= Length(s)) 
                and TryStrToInt('$' + string(Copy(s, i + 2, 4)), c) then begin
          inc(i, 6);
          Result[j] := WideChar(c);
        end else begin
          raise Exception.CreateFmt('Invalid code at position %d', [i]);
        end;
      end else begin
        raise Exception.Create('Unexpected end of string');
      end;
    end else begin
      Result[j] := WideChar(s[i]);
      inc(i);
    end;
    inc(j);
  end;

  // Trim result in case we reserved too much space
  SetLength(Result, j - 1);
end;

const
  NormalizationC = 1;

function NormalizeString(NormForm: Integer; lpSrcString: LPCWSTR; cwSrcLength: Integer;
 lpDstString: LPWSTR; cwDstLength: Integer): Integer; stdcall; external 'Normaliz.dll';

function Normalize(const s: string): string;
var
  newLength: integer;
begin
  // in NormalizationC mode the result string won't grow longer than the input string
  SetLength(Result, Length(s));
  newLength := NormalizeString(NormalizationC, PChar(s), Length(s), PChar(Result), Length(Result));
  SetLength(Result, newLength);
end;

function UnescapeAndNormalize(const s: AnsiString): string;
begin
  Result := Normalize(Unescape(s));
end;

谢谢大家！我确信我第一次使用StackOverflow不会是我的最后一次:-)

Answer 3

他们总是像这样逃脱吗？总是4位数？

\字符本身是如何转义的？

假设\字符由\ xxxx转义，其中xxxx是\字符的代码，您可以轻松遍历字符串：

function Unescape(s: AnsiString): WideString;
var
  i: Integer;
  j: Integer;
  c: Integer;
begin
  // Make result at least large enough. This prevents too many reallocs
  SetLength(Result, Length(s));
  i := 1; j := 1;
  while i <= Length(s) do
  begin
     // If a '\' is found, typecast the following 4 digit integer to widechar
     if s[i] = '\' then
     begin
       if (s[i+1] <> 'u') or not TryStrToInt(Copy(s, i+2, 4), c) then
         raise Exception.CreateFmt('Invalid code at position %d', [i]);

       Inc(i, 6);
       Result[j] := WideChar(c);
     end
     else
     begin
       Result[j] := WideChar(s[i]);
       Inc(i);
     end;
     Inc(j);
  end;

  // Trim result in case we reserved too much space
  SetLength(Result, j-1);
end;

像这样使用

  MessageBoxW(0, PWideChar(Unescape('\u0252berhaupt')), nil, MB_OK);

此代码在Delphi 2007中进行了测试，但由于明确使用了Ansistring和Widestring，因此也应该在XE中运行。

[编辑]代码没问题。荧光笔失败。

Answer 4

如果我没弄错的话，Delphi XE现在支持正则表达式。我不经常使用它们，但它似乎是解析字符串然后替换所有转义值的好方法。也许有人有一个很好的例子，说明如何使用正则表达式在Delphi中执行此操作？

Answer 5

GolezTrol，你忘了'$'

if (s[i+1] <> 'u') or not TryStrToInt('$'+Copy(s, i+2, 4), c) then

Delphi XE AnsiStrings与逃避组合变音符号

5 个答案: