Question

我有这样的文本文件：

"01","AAA","AAAAA" 
"02","BBB","BBBBB","BBBBBBBB" 
"03","CCC" 
"04","DDD","DDDDD"

我想将此文本文件数据加载到sybase db中的临时表中。所以，我需要构建一个程序来逐行读取这个文本文件直到eof。如果文本文件大小很小，则逐行读取的过程很快。但是如果文本文件太大（可能超过500M），则逐行读取的进程太慢。我认为逐行读取方法不适合大文本文件。因此，需要找到其他解决方案来将文本文件数据加载到db而不是逐行读取文本文件。有什么建议吗？示例代码：

var
  myFile : TextFile;
  text   : string;

begin
  // Try to open the Test.txt file for writing to
  AssignFile(myFile, 'Test.txt');

  // Display the file contents
  while not Eof(myFile) do
  begin
    ReadLn(myFile, text);
    TempTable.append;
    TempTable.FieldByName('Field1').asstring=Copy(text,2,2);
    TempTable.FieldByName('Field2').asstring=Copy(text,7,3);
    TempTable.FieldByName('Field3').asstring=Copy(text,13,5);
    TempTable.FieldByName('Field4').asstring=Copy(text,21,8);
    TempTable.post;
  end;

  // Close the file for the last time
  CloseFile(myFile);
end;

Answer 1

文本文件通常具有非常小的缓冲区。请查看使用SetTextBuf function来提高效果。

var
  myFile : TextFile;
  text   : string;
  myFileBuffer: Array[1..32768] of byte;
begin
// Try to open the Test.txt file for writing to
  AssignFile(myFile, 'Test.txt');
  SetTextBuf(MyFile, myFileBuffer);
  Reset(MyFile);

// Display the file contents
  while not Eof(myFile) do
    begin
      ReadLn(myFile, text);
    end;

// Close the file for the last time
  CloseFile(myFile);
end;

Answer 2

一些一般提示：

确保您的TempTable在内存中，或使用快速数据库引擎 - 查看SQlite3或其他方法（如FireBird嵌入式，NexusDB或ElevateDB）作为可能的数据库备选方案;
如果您不使用TTable，而是使用真正的数据库，请确保将插入嵌套在事务中;
对于真正的数据库，请查看是否不能使用 ArrayDML 功能，这样可以更快地在远程数据库（如Sybase）中插入大量数据 - {{3 AFAIK;
已知FieldByName('...')方法非常慢：改为使用本地TField变量;
使用TextFile时，请指定一个更大的临时缓冲区;
如果您使用较新的Unicode版本的Delphi（2009+），使用TextFile不是最佳选择。

所以你的代码可能是：

var
  myFile : TextFile;
  myFileBuffer: array[word] of byte;
  text   : string;
  Field1, Field2, Field3, Field4: TField;
begin

  // Set Field* local variables for speed within the main loop
  Field1 := TempTable.FieldByName('Field1');
  Field2 := TempTable.FieldByName('Field2');
  Field3 := TempTable.FieldByName('Field3');
  Field4 := TempTable.FieldByName('Field4');

  // Try to open the Test.txt file for writing to
  AssignFile(myFile, 'Test.txt');
  SetTextBuf(myFile, myFileBuffer); // use 64 KB read buffer

  // Display the file contents
  while not Eof(myFile) do
  begin
    ReadLn(myFile, text);
    TempTable.append;
    Field1.asInteger := StrToInt(Copy(text,2,2));
    Field2.asString := Copy(text,7,3);
    Field3.asString := Copy(text,13,5);
    Field4.asString := Copy(text,21,8);
    TempTable.post;
  end;

  // Close the file for the last time
  CloseFile(myFile);
end;

您可以使用嵌入式引擎实现极高的速度，几乎没有大小限制，但您的存储空间。例如，参见such Array DML is handled for instance with FireDAC：数据库文件中每秒大约130,000 / 150,000行，包括所有ORM编组。我还发现 SQLite3 生成的数据库文件比替代版本小得多。如果要快速检索任何字段，请不要忘记在数据库中定义 INDEXes ，如果可能的话在插入行数据后（为了更好的速度）。对于 SQLite3 ，已经有一个ID/RowID整数主键可用，它可以映射您的第一个数据字段。此ID/RowID整数主键已由 SQLite3 编制索引。顺便说一句，我们的ORM现在支持how fast we can add content to a SQLite3 database in our ORM。

Answer 3

除了已经说过的内容之外，我还会避免使用任何TTable组件。最好使用TQuery类型组件（取决于您使用的访问层）。这样的事情： -

qryImport.SQL := 'Insert Into MyTable Values (:Field1, :Field2, :Field3, :Field4);';

Procedure ImportRecord(Const pField1, pField2, pField3, pField4 : String);
Begin
  qryImport.Close;
  qryImport.Params[0].AsString := pField1;      
  qryImport.Params[1].AsString := pField2;`
  qryImport.Params[2].AsString := pField3;
  qryImport.Params[3].AsString := pField4;
  qryImport.ExecSQL;
End;

希望这有帮助。

Answer 4

另一种方法是使用内存映射文件（你可以google或go torry.net来查找实现）。它不适用于文件＆gt; 1gb（在win32中，在win64中你几乎可以映射任何文件）。它会将您的所有文件转换为PAnsiChar，您可以像一个大缓冲区一样扫描，搜索＃10和＃13（单独或成对），从而手动拆分字符串。

Answer 5

如果您使用（或不介意开始使用）JEDI Jvcl，它们会有一个TJvCSVDataSet，您可以像使用Delphi中的任何其他数据集一样简单地使用CSV文件，包括能够使用定义持久字段并使用“标准”Delphi数据库功能：

JvCSVDataSet1.FileName := 'MyFile.csv';
JvCSVDataSet1.Open;
while not JvCSVDataSet1.Eof do
begin
  TempTable.Append; // Posts last appended row automatically;
                    // no need to call Post here.

  // Assumes TempTable has same # of fields in the
  // same order
  for i := 0 to JvCSVDataSet1.FieldCount - 1 do
    TempTable.Fields[i].AsString := JvCSVDataSet1.Fields[i].AsString;
  JvCSVDataSet1.Next;  
end;

// Post the last row appended when the loop above exited
if TempTable.State in dsEditModes then
  TempTable.Post;

Answer 6

在Delphi 7中，您可以使用Turbo Power SysTools TStAnsiTextStream（）以面向行的方式进行读写，但使用线程安全的TStream实现而不是不安全的旧pascal文件接口。在后来的Delphi版本中，您会发现标准RTL中的内容相似（尽管它们的实现略有不同），但Delphi 7对文本文件操作没有太多帮助。

在Delphi 7中处理大文本文件数据的最佳解决方案

6 个答案: