注意:
原帖子标题
为什么DWScript中的多线程JSON解析器无法按线程数进行扩展?
已更改,因为此问题与使用DWScript 处理JSON数据无关。 问题出在Delphi XE2到XE7的默认内存管理器中(测试的是XE2和试用版XE7),但问题首先出现在这种类型的应用程序中。
我有多线程的Win32 / Win64 vcl应用程序,它在Delphi XE2中处理JSON数据。
每个线程使用DWScript中的TdwsJSONValue.ParseString(sJSON)
解析JSON数据,使用DWScript方法读取值并将结果存储为记录。
出于测试目的,我在每个线程中处理相同的JSON数据。
单个thead运行在线程内花费N
秒来处理数据。越来越多的线程到M
lineary(大约M * N
)会增加处理相同数据所需的单线程内的时间。
结果没有速度提升。此应用程序的其他部分(JSON数据传递,将结果存储在目标环境中) - 按预期进行扩展。
可能是什么原因?任何想法都赞赏。
补充资料:
在Win7 / 32和Win7 / 64,Win8 / 64上测试,从2核到12核(w / w-out HT)系统
选择最快的DWScript(测试了一堆,其中包括:Superobject,内置Delphi)。 SO的行为类似于DWS中的JSON单元。
以下是完整的控制台应用程序,说明了问题。要运行它,我们需要这里提供的示例json数据:https://www.dropbox.com/s/4iuv87ytpcdugk6/json1.zip?dl=0此文件包含第一个线程的数据json1.dat
。对于最多16个线程,只需将json1.dat复制到json2.dat ... json16.dat即可。
程序和数据应该在同一个文件夹中。运行:convert.exe N,其中N是线程数。
程序将执行时间在msecs中写入stout - 在线程中花费,解析数据的时间和释放时间(Destroy)TdwsJSONValue对象。
声明_dwsjvData.Destroy;
无法扩展。
program Convert;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils,
System.Diagnostics,
System.Classes,
dwsJSON in 'dwsJSON.pas',
dwsStrings in 'dwsStrings.pas',
dwsUtils in 'dwsUtils.pas',
dwsXPlatform in 'dwsXPlatform.pas';
type
TWorkerThread = class (TThread)
private
_iUid: Integer;
_swWatch: TStopwatch;
_lRunning: Boolean;
_sFileJSonData: String;
_fJsonData: TextFile;
protected
constructor Create (AUid: Integer);
procedure Execute; override;
published
property Running: Boolean read _lRunning;
end;
TConverter = class (TObject)
private
_swWatch0, _swWatch1, _swWatch2: TStopwatch;
_dwsjvData: TdwsJSONValue;
protected
constructor Create;
destructor Destroy; override;
function Calculate (AUid: Integer; AJSonData: String; var AParse, ADestroy: Integer): Integer;
end;
const
MAX_THREADS = 16;
var
iHowMany: Integer;
athWorker: array [1..MAX_THREADS] of Pointer;
aiElapsed: array [1..MAX_THREADS] of Integer;
aiElapsedParse: array [1..MAX_THREADS] of Integer;
aiElapsedDestroy: array [1..MAX_THREADS] of Integer;
aiFares: array [1..MAX_THREADS] of Integer;
swWatchT, swWatchP: TStopwatch;
constructor TWorkerThread.Create (AUid: Integer);
begin
inherited Create (True);
_iUid := AUid;
_swWatch := TStopwatch.Create;
_sFileJSonData := ExtractFilePath (ParamStr (0)) + 'json' + Trim (IntToStr (_iUid)) + '.dat';
_lRunning := False;
Suspended := False;
end;
procedure TWorkerThread.Execute;
var
j: Integer;
sLine: String;
slLines: TStringList;
oS: TConverter;
begin
_lRunning := True;
oS := TConverter.Create;
slLines := TStringList.Create;
System.AssignFile (_fJsonData, _sFileJSonData);
System.Reset (_fJsonData);
j := 0;
repeat
System.Readln (_fJsonData, sLine);
slLines.Add (sLine);
Inc (j);
until (j = 50);
// until (System.Eof (_fJsonData));
System.Close (_fJsonData);
Sleep (1000);
_swWatch.Reset;
_swWatch.Start;
aiFares [_iUid] := 0;
aiElapsedParse [_iUid] := 0;
aiElapsedDestroy [_iUid] := 0;
for j := 1 to slLines.Count do
aiFares [_iUid] := aiFares [_iUid] + oS.Calculate (_iUid, slLines.Strings [j - 1], aiElapsedParse [_iUid], aiElapsedDestroy [_iUid]);
_swWatch.Stop;
slLines.Free;
os.Destroy;
aiElapsed [_iUid] := _swWatch.ElapsedMilliseconds;
_lRunning := False;
end;
constructor TConverter.Create;
begin
inherited Create;
_swWatch0 := TStopwatch.Create;
_swWatch1 := TStopwatch.Create;
_swWatch2 := TStopwatch.Create;
end;
destructor TConverter.Destroy;
begin
inherited;
end;
function TConverter.Calculate (AUid: Integer; AJSonData: String; var AParse, ADestroy: Integer): Integer;
var
jFare, jTotalFares, iElapsedParse, iElapsedDestroy, iElapsedTotal: Integer;
begin
_swWatch0.Reset;
_swWatch0.Start;
_swWatch1.Reset;
_swWatch1.Start;
_dwsjvData := TdwsJSONValue.ParseString (AJSonData);
_swWatch1.Stop;
iElapsedParse := _swWatch1.ElapsedMilliseconds;
if (_dwsjvData.ValueType = jvtArray) then
begin
_swWatch2.Reset;
_swWatch2.Start;
jTotalFares := _dwsjvData.ElementCount;
for jFare := 0 to (jTotalFares - 1) do
if (_dwsjvData.Elements [jFare].ValueType = jvtObject) then
begin
_swWatch1.Reset;
_swWatch1.Start;
_swWatch1.Stop;
end;
end;
_swWatch1.Reset;
_swWatch1.Start;
_dwsjvData.Destroy;
_swWatch1.Stop;
iElapsedDestroy := _swWatch1.ElapsedMilliseconds;
_swWatch0.Stop;
iElapsedTotal := _swWatch0.ElapsedMilliseconds;
Inc (AParse, iElapsedParse);
Inc (ADestroy, iElapsedDestroy);
result := jTotalFares;
end;
procedure MultithreadStart;
var
j: Integer;
begin
for j := 1 to iHowMany do
if (athWorker [j] = nil) then
begin
athWorker [j] := TWorkerThread.Create (j);
TWorkerThread (athWorker [j]).FreeOnTerminate := False;
TWorkerThread (athWorker [j]).Priority := tpNormal;
end;
end;
procedure MultithreadStop;
var
j: Integer;
begin
for j := 1 to MAX_THREADS do
if (athWorker [j] <> nil) then
begin
TWorkerThread (athWorker [j]).Terminate;
TWorkerThread (athWorker [j]).WaitFor;
TWorkerThread (athWorker [j]).Free;
athWorker [j] := nil;
end;
end;
procedure Prologue;
var
j: Integer;
begin
iHowMany := StrToInt (ParamStr (1));
for j := 1 to MAX_THREADS do
athWorker [j] := nil;
swWatchT := TStopwatch.Create;
swWatchT.Reset;
swWatchP := TStopwatch.Create;
swWatchP.Reset;
end;
procedure RunConvert;
function __IsRunning: Boolean;
var
j: Integer;
begin
result := False;
for j := 1 to MAX_THREADS do
result := result or ((athWorker [j] <> nil) and TWorkerThread (athWorker [j]).Running);
end;
begin
swWatchT.Start;
MultithreadStart;
Sleep (1000);
while (__isRunning) do
Sleep (500);
MultithreadStop;
swWatchT.Stop;
Writeln (#13#10, 'Total time:', swWatchT.ElapsedMilliseconds);
end;
procedure Epilogue;
var
j: Integer;
begin
for j := 1 to iHowMany do
Writeln ( #13#10, 'Thread # ', j, ' tot.time:', aiElapsed [j], ' fares:', aiFares [j], ' tot.parse:', aiElapsedParse [j], ' tot.destroy:', aiElapsedDestroy [j]);
Readln;
end;
begin
try
Prologue;
RunConvert;
Epilogue;
except
on E: Exception do
Writeln (E.ClassName, ': ', E.Message);
end;
end.
答案 0 :(得分:1)
您是否尝试过我的可扩展内存管理器?因为Delphi(内部使用fastmm)不能很好地扩展字符串和其他与内存相关的东西: https://scalemm.googlecode.com/files/ScaleMM_v2_4_1.zip
你也可以尝试我的探查器的两种探查器模式,看看哪个部分是瓶颈: https://code.google.com/p/asmprofiler/
答案 1 :(得分:1)
该解决方案是具有英特尔®线程构建模块内存管理器的交换默认Delphi XE2或XE7内存管理器。在示例应用程序中,它缩放ca.当app为64位时,线性数量最多为16个。
update: with assumption that number of threads running is less than number of cores
这是在2cores / 4ht到12cores / 24ht的机器上测试的,运行KVM虚拟化Windows 7,内存为124GB
有趣的是虚拟化Win 7.内存分配和释放比原生Win 7快2倍。
结论:如果在多线程(超过4-8个线程)应用程序的线程中对10kB-10MB块执行大量内存分配/解除分配操作 - 只使用Intel的内存管理器。
@André:感谢小费指向我正确的方向!
这是TBB内存管理器用于测试的单元,它必须在主项目文件中的单元列表中显示为第1个.dpr
unit TBBMem;
interface
function ScalableGetMem (ASize: NativeInt): Pointer; cdecl; external 'tbbmalloc' name 'scalable_malloc';
procedure ScalableFreeMem (APtr: Pointer); cdecl; external 'tbbmalloc' name 'scalable_free';
function ScalableReAlloc (APtr: Pointer; Size: NativeInt): Pointer; cdecl; external 'tbbmalloc' name 'scalable_realloc';
implementation
Function TBBGetMem (ASize: Integer): Pointer;
begin
result := ScalableGetMem (ASize);
end;
Function TBBFreeMem (APtr: Pointer): Integer;
begin
ScalableFreeMem (APtr);
result := 0;
end;
Function TBBReAllocMem (APtr: Pointer; ASize: Integer): Pointer;
begin
result := ScalableRealloc (APtr, ASize);
end;
const
TBBMemoryManager: TMemoryManager = ( GetMem: TBBGetmem;
FreeMem: TBBFreeMem;
ReAllocMem: TBBReAllocMem; );
var
oldMemoryManager: TMemoryManager;
initialization
GetMemoryManager (oldMemoryManager);
SetMemoryManager (TBBMemoryManager);
finalization
SetMemoryManager (oldMemoryManager);
end.
答案 2 :(得分:1)
我对FastCode MM Challenge进行了(重新)测试,结果对于TBB来说并不是那么好(在块缩小测试中也是内存异常)。
简而言之:在这个复杂的测试中,ScaleMM2和Google TCmalloc是最快的,Fastmm和ScaleMM2使用的内存最少。
Average Speed Performance: (Scaled so that the winner = 100%)
XE6 : 70,4
TCmalloc : 89,1
ScaleMem2 : 100,0
TBBMem : 77,8
Average Memory Performance: (Scaled so that the winner = 100%)
XE6 : 100,0
TCmalloc : 29,6
ScaleMem2 : 75,6
TBBMem : 38,4
FastCode挑战:https://code.google.com/p/scalemm/source/browse/#svn%2Ftrunk%2FChallenge
TBB 4.3:https://www.threadingbuildingblocks.org/download