我有一个程序可以计算来自Folder和FolderB的各种文件哈希值,然后比较结果以确保两者都相同。文件名存储在StringListUTF8中,哈希值存储在StringLists中,作为字符串。
问题是,虽然它适用于小数据量,如果有数十万个文件,程序会过时和/或需要很长时间,特别是当并入SHA-1,SHA256和SHA512时。
我希望通过在TLIst中存储散列摘要的二进制(整数表示),然后将这些值作为整数或其他内容进行比较,使其更快更有效。我想我可以使用TList并将值添加到该值中,如果我为两个文件夹中的每一个创建两个TList并将哈希摘要放入其中,我可以轻松快速地比较各个哈希值,如果有任何差异如果它们是相同的,我也可以在闪光灯中进行比较。
通过简单的演示,我创建了一个真正基本的当前程序的精简版本,然后你会希望看到我在尝试二进制存储的位置。这个简单的演示只是散列FolderA中的每个文件,并将字符串版本存储在StringList中。对FolderB重复相同的操作。然后对这两个StringLists进行散列以查看它们是否匹配。真实程序还检查列表中的每个条目是否存在不匹配以确定哪些文件是错误的。
以下是简化版的代码:
// Choose Source Folder
procedure TForm1.Button1Click(Sender: TObject);
begin
if SelectDirectoryDialog1.Execute then
begin
label1.Caption := SelectDirectoryDialog1.FileName;;
end;
end;
// Choose Second Folder
procedure TForm1.Button2Click(Sender: TObject);
begin
if SelectDirectoryDialog2.Execute then
begin
label2.Caption := SelectDirectoryDialog2.FileName;
end;
end;
// Compare FolderA against FolderB
procedure TForm1.Button3Click(Sender: TObject);
var
FolderA, FolderB : TStringListUTF8;
ListOfHashesA, ListOfHashesB : TStringList;
HashOfFolderA, HashOfFolderB : string;
ListOfBinHashesA : TList;
begin
try
FolderA := TStringListUTF8.Create;
FolderA := FindAllFilesEx(SelectDirectoryDialog1.FileName, '*', True, True);
ListOfHashesA := HashFolder(FolderA); // List of hashes of all files in FolderA
// This is where I hope to replace the use of strings for binary representation
// of the hashes
//ListOfBinHashesA := BinaryHashFolder(FolderA); // List of hashes of all files in FolderA
// Now generate hash of the FolderA hash list itself. Faster than comparing each hash value line by line
HashOfFolderA := Uppercase(MDPrint(MDString(ListOfHashesA.Text, MD_VERSION_5)));
finally
FolderA.Free;
end;
try
FolderB := TStringListUTF8.Create;
FolderB := FindAllFilesEx(SelectDirectoryDialog2.FileName, '*', True, True);
ListOfHashesB := HashFolder(FolderB); // List of hashes of all files in FolderB
// Now generate hash of the FolderB hash list itself. Faster than comparing each hash value line by line
HashOfFolderB := Uppercase(MDPrint(MDString(ListOfHashesB.Text, MD_VERSION_5)));
finally
FolderB.Free;
end;
if HashOfFolderA = HashOfFolderB then
begin
ShowMessage(HashOfFolderA + ' ' + HashOfFolderB + ' : MATCH');
end
else
ShowMessage(HashOfFolderA + ' ' + HashOfFolderB + ' : MIS-MATCH');
end;
{ I want this function to add a method of computing and then storing the binary hash, as an integer,
and then return a big TList full of the values for FolderA, and then I'll call it again for FolderB :
}
function TForm1.BinaryHashFolder(FolderName : TStringListUTF8) : TList;
var
BinList : TList;
i : integer;
begin
try
BinList := TList.Create;
for i:= 0 to FolderName.Count -1 do
begin
// Compute binary hash val for each file and add to the TList
// BinList.Add(MD5File(FolderName.Strings[i], 2097152)));
end;
finally
result := BinList;
end;
end;
// Looks through the list of filenames and computes string version of MD5 value for each file
// Each hash value is added to a string list and that whole list is then returned
function TForm1.HashFolder(FolderName : TStringListUTF8) : TStringList;
var
slHashList : TStringList;
i : integer;
begin
try
slHashList := TStringList.Create;
slHashList.Sorted := true;
for i:= 0 to FolderName.Count -1 do
begin
slHashList.Add(Uppercase(MD5Print(MD5File(FolderName.Strings[i]))));
end;
finally
result := slHashList;
end;
end;
希望你们中的一位能帮助我完成TList的使用,因为我无法让它发挥作用。我甚至不知道我是否在正确的路线上?欢迎任何有关更好解决方案的建议。
注意我使用的是Freepascal 2.6.4和Lazarus 1.4.4