为什么两个TBytes不能使用重叠数据?

时间:2016-08-31 12:03:50

标签: delphi dynamic-arrays

考虑以下XE6代码。目的是ThingData应该写入控制台Thing1& Thing2,但事实并非如此。那是为什么?

program BytesFiddle;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

type
  TThing = class
  private
    FBuf : TBytes;
    FData : TBytes;
    function GetThingData: TBytes;
    function GetThingType: Byte;
  public
    property ThingType : Byte read GetThingType;
    property ThingData : TBytes read GetThingData;

    constructor CreateThing(const AThingType : Byte; const AThingData: TBytes);
  end;

{ TThing1 }

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + 1);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));

  FData := @FBuf[1];
  SetLength(FData, Length(FBuf) - 1);
end;

function TThing.GetThingData: TBytes;
begin
  Result := FData;
end;

function TThing.GetThingType: Byte;
begin
  Result := FBuf[0];
end;

var
  Thing1, Thing2 : TThing;

begin
  try
    Thing1 := TThing.CreateThing(0, TEncoding.UTF8.GetBytes('Sneetch'));
    Thing2 := TThing.CreateThing(1, TEncoding.UTF8.GetBytes('Star Belly Sneetch'));

    Writeln(TEncoding.UTF8.GetString(Thing2.ThingData));
    Writeln(Format('Type %d', [Thing2.ThingType]));

    Writeln(TEncoding.UTF8.GetString(Thing1.ThingData));
    Writeln(Format('Type %d', [Thing1.ThingType]));

    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

4 个答案:

答案 0 :(得分:33)

Let me walk you through the ways in which this code fails and how the compiler allows you to shoot yourself in the foot.

If you step through the code using the debugger you can see what happens.

enter image description here

After the initialization of Thing1 you can see that FData is filled with all zeros.
Strangely enough Thing2 is fine.
Therefore the error is in CreateThing. Let's investigate further...

In the oddly named constructor CreateThing you have the following line:

FData := @FBuf[1];

This looks like a simple assignment, but is really a call to DynArrayAssign

Project97.dpr.32: FData := @FBuf[1];
0042373A 8B45FC           mov eax,[ebp-$04]
0042373D 83C008           add eax,$08
00423743 8B5204           mov edx,[edx+$04]
00423746 42               inc edx
00423747 8B0DE03C4000     mov ecx,[$00403ce0]
0042374D E8E66DFEFF       call @DynArrayAsg      <<-- lots of stuff happening here.  

One of the checks DynArrayAsg performs is to check whether the source dynamic array is empty or not.
DynArrayAsg also does a few other things which you need to be aware about.

Let's first have a look at the structure of a dynamic array; it's not just a simple pointer to an array!

Offset 32/64  |   Contents     
--------------+--------------------------------------------------------------
-8/-12        | 32 bit reference count
-4/-8         | 32 or 64 bit length indicator 
 0/ 0         | data of the array.

Performing FData = @FBuf[1] you are messing up with the prefix fields of the dynamic array.
The 4 bytes in front of @Fbuf[1] are interpreted as the length.
For Thing1 these are:

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 00 00 00  08 00 00 00  00  'S' 'n' .....
FData:    00 00 00 08  00 00 00 00  .............. //Hey that's a zero length.

Oops, when DynArrayAsg starts investigating it sees that what it thinks is the source for the assign has a length of zero, i.e. it thinks the source is empty and does not assign anything. It leaves FData unchanged!

Does Thing2 work as intended?
It looks like it does, but it actually fails in rather a bad way, let me show you.

enter image description here

You've successfully tricked the runtime into believing @FBuf[1] is a valid reference to a dynamic array.
Because of this the FData pointer has been updated to point to FBuf[1] (so far so good), and the reference count of FData has been increased by 1 (not good), also the runtime has grown the memory block holding the dynamic array to what it thinks is the correct size for FData (bad).

          -8 (refcnt)  -4 (len)     0 (data)
FBuf:     01 01 00 00  13 00 00 00  01  'S' 'n' .....
FData:    01 00 00 13  00 00 00 01  'S' .............. 

Oops FData now has a refcount of 318,767,105 and a length of 16,777,216 bytes.
FBuf also has its length increased, but its refcount is now 257.

This is why you need the call to SetLength to undo the massive overallocation of memory. This still does not fix the reference counts though.
The overallocation may cause out of memory errors (esp. on 64-bit) and the wacky refcounts cause a memory leak because your arrays will never get freed.

The solution
As per David's answer: enable typed checked pointers: {$TYPEDADDRESS ON}

You can fix the code by defining FData as a normal PAnsiChar or PByte.
If you make sure to always terminate your assignments to FBuf with a double zero FData will work as expected.

Make FData a TBuffer like so:

TBuffer = record
private
  FData : PByte;
  function GetLength: cardinal;
  function GetType: byte;
public
  class operator implicit(const A: TBytes): TBuffer;
  class operator implicit(const A: TBuffer): PByte;
  property Length: cardinal read GetLength;
  property DataType: byte read GetType;
end;

Rewrite CreateThing like so:

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
begin
  SetLength(FBuf, Length(AThingData) + Sizeof(AThingType) + 2);
  FBuf[0] := AThingType;
  Move(AThingData[0], FBuf[1], Length(AThingData));
  FBuf[Lengh(FBuf)-1]:= 0;
  FBuf[Lengh(FBuf)-2]:= 0;  //trailing zeros for compatibility with pansichar

  FData := FBuf;  //will call the implicit class operator.
end;

class operator TBuffer.implicit(const A: TBytes): TBuffer;
begin
  Result.FData:= PByte(@A[1]);
end;

I don't understand all this mucking about trying to outsmart the compiler.
Why not just declare FData like so:

type
  TMyData = record
    DataType: byte;
    Buffer: Ansistring;  
    ....

And work with that.

答案 1 :(得分:18)

通过启用类型检查指针可以很容易地看到问题。将其添加到代码顶部:

{$TYPEDADDRESS ON}

documentation说:

  

$ T指令控制由...生成的指针值的类型   @运算符和指针类型的兼容性。

     

在{$ T-}状态中,@运算符的结果始终是无类型的   指针(指针)与所有其他指针类型兼容。   当@被应用于{$ T +}状态的变量引用时,   result是一个类型指针,只与Pointer和   与其他指向变量类型的指针。

     

在{$ T-}状态中,除指针之外的不同指针类型是   不兼容(即使它们是指向相同类型的指针)。在里面   {$ T +}状态,指向相同类型的指针是兼容的。

通过该更改,您的程序无法编译。这一行失败了:

FData := @FBuf[1];

错误消息是:

  

E2010不兼容的类型:'System.TArray<System.Byte>''Pointer'

现在,FData的类型为TArray<Byte>,但@FBuf[1]不是动态数组,而是指向动态数组中间字节的指针。两者不兼容。通过在未对类型进行类型检查的默认模式下运行,编译器可以让您犯下这个可怕的错误。这就是为什么这是默认模式完全超出我的范围。

动态数组不仅仅是指向第一个元素的指针 - 还有长度和引用计数等元数据。该元数据存储在距第一个元素的偏移处。因此,您的整个设计都存在缺陷。将类型代码存储在单独的变量中,而不是作为动态数组的一部分。

答案 2 :(得分:6)

动态数组是内部指针,与指针分配兼容;但是赋值右侧唯一正确的指针是nil或另一个动态数组。 FData := @FBuf[1];显然是错误的,但有趣的是,即使FData := @FBuf[0];已启用,$TYPEDADDRESS也可能正常。

以下代码在Delphi XE中编译并按预期工作:

program Project19;

{$APPTYPE CONSOLE}
{$TYPEDADDRESS ON}

uses
  SysUtils;

procedure Test;
var
  A, B: TBytes;

begin
  A:= TBytes.Create(11,22,33);
  B:= @A[0];
  Writeln(B[1]);
end;

begin
  try
    Test;
    readln;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

好像编译器&#34;知道&#34; @A[0]是动态数组,而不仅仅是指针。

答案 3 :(得分:-1)

constructor TThing.CreateThing(const AThingType : Byte; const AThingData: TBytes);
var
  Buffer : array of Byte;
begin
  SetLength(Buffer, Length(AThingData) + Sizeof(AThingType));
  Buffer[0] := AThingType;
  Move(AThingData[0], Buffer[1], Length(AThingData));

  SetLength(FBuf, Length(Buffer));
  Move(Buffer[0], FBuf[0], Length(Buffer));
  SetLength(FData, Length(AThingData));
  Move(Buffer[1], FData[0], Length(AThingData));
end;