Question

我在这里有一个类似的问题：Delphi XE - should I use String or AnsiString?。在确定在我的（大）库中使用ANSI字符串是正确的之后，我意识到我实际上可以使用RawByteString而不是ANSI。因为我将UNICODE字符串与ANSI字符串混合在一起，所以我的代码现在几乎没有在它们之间进行转换的地方。但是，看起来如果我使用RawByteString，我就可以摆脱这些转换。

请让我知道您的意见。
感谢。

更新：
这似乎令人失望。看起来编译器仍然会从RawByteString转换为字符串。

procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
    s: string;
begin
  x1:= 'a';
  x2:= 'b';
  x1:= x1+ x2;
  s:= x1;              {      <------- Implicit string cast from 'RawByteString' to 'string'     }
end;

我认为它做了一些内部工作（例如复制数据），我的代码也不会快得多，我仍然需要在代码中添加大量的类型转换才能使编译器静音。

Answer 1

RawByteString是AnsiString，默认情况下未设置代码页。

当您为此string变量分配另一个RawByteString时，您将复制源string的代码页。这将包括转换。遗憾。

但还有另外一种RawByteString用法，用于存储普通字节内容（例如数据库BLOB字段内容，就像array of byte一样）

总结：

RawByteString应该用作方法或函数的“代码页不可知”参数;
RawByteString可用作存储某些BLOB数据的变量类型。

如果你想减少转换，宁愿在你的应用程序中使用8位字符string，你应该更好：

不要使用通用AnsiString类型，它取决于当前系统代码页，并且您将通过该类型丢失数据;
依赖于UTF-8编码，即一些8位代码页/字符集，当从UnicodeString转换为WideString时不会丢失任何数据;
不要让编译器显示有关隐式转换的警告：所有转换都应该明确;
使用您自己的专用功能集来处理您的UTF-8内容。

这正是我们为我们的框架所做的。我们想在其内核中使用UTF-8，因为：

我们依靠UTF-8编码的JSON进行数据传输;
内存消耗会更小;
使用过的 SQLite3 引擎会将文本以UTF-8的形式存储在其数据库文件中;
我们想要一种处理Unicode文本的方法，所有版本的Delphi（从Delphi 6到XE）都不会丢失数据，并且{{ RawUTF8 is an UTF-8 String stored in an AnsiString - use this type instead of System.UTF8String, which behavior changed between Delphi 2009 compiler and previous versions: our implementation is consistent and compatible with all versions of Delphi compiler - mimic Delphi 2009 UTF8String, without the charset conversion overhead - all conversion to/from AnsiString or RawUnicode must be explicit } {$ifdef UNICODE} RawUTF8 = type AnsiString(CP_UTF8); // Codepage for an UTF8string {$else} RawUTF8 = type AnsiString; {$endif} /// our fast RawUTF8 version of Trim(), for Unicode only compiler // - this Trim() is seldom used, but this RawUTF8 specific version is needed // by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString function Trim(const S: RawUTF8): RawUTF8; /// our fast RawUTF8 version of Pos(), for Unicode only compiler // - this Pos() is seldom used, but this RawUTF8 specific version is needed // by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString function Pos(const substr, str: RawUTF8): Integer; overload; inline;不是一个选项，因为它已经很慢而你已经得到了同样的隐式转换问题。

但是，为了达到最佳速度，我们编写了一些优化函数来处理我们的自定义字符串类型：

RawByteString

我们保留{$ifndef UNICODE} /// define RawByteString, as it does exist in Delphi 2009/2010/XE // - to be used for byte storage into an AnsiString // - use this type if you don't want the Delphi compiler not to do any // code page conversions when you assign a typed AnsiString to a RawByteString, // i.e. a RawUTF8 or a WinAnsiString RawByteString = AnsiString; /// pointer to a RawByteString PRawByteString = ^RawByteString; {$endif} /// create a File from a string content // - uses RawByteString for byte storage, thatever the codepage is function FileFromString(const Content: RawByteString; const FileName: TFileName; FlushOnDisk: boolean=false): boolean;类型来处理BLOB数据：

Pos

源代码可用in our repository。在这个单元中，UTF-8相关功能进行了深度优化，两个版本都在pascal和asm中以提高速度。我们有时会重载默认函数（例如AnsiString）以避免转换，或者有关我们如何处理框架中文本的更多信息是available here。

最后一句话：

如果您确定您的应用程序中只有7位内容（没有突出显示的字符），则可以在程序中使用默认的AnsiStrings类型。但在这种情况下，最好在uses子句中添加{{1}}单元，以便重载字符串函数，避免大多数不需要的转换。

Answer 2

RawByteString 仍是“AnsiString”。最好将其描述为“通用接收器”，这意味着它将承担源代码串的代码页在分配点上的任何内容，而不强制进行代码页转换。 RawByteString旨在仅使用作为函数参数，因此，当您调用使用AnsiStrings的实用程序函数时，您将不会在具有不同代码页关联性的AnsiStrings之间进行转换。 / p>

但是，在上面的情况中，您将基本上是AnsiString的内容分配给UnicodeString，将进行转换。它必须进行转换，因为RawByteString的有效载荷为8位字符，而字符串（UnicodeString）的有效载荷为16位字符。

Delphi XE - RawByteString vs AnsiString

2 个答案: