我正在阅读zip文件的内容并尝试提取它们。
var allZipEntries = ZipFile.Open(zipFileFullPath, ZipArchiveMode.Read).Entries;
现在,如果我提取使用Foreach循环,这可以正常工作。缺点是它相当于zip.extract方法,并且在打算提取所有文件时我没有任何优势。
foreach (var currentEntry in allZipEntries)
{
if (currentEntry.FullName.Equals(currentEntry.Name))
{
currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
}
else
{
var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
Directory.CreateDirectory(subDirectoryPath);
currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
}
}
现在利用TPL尝试使用Parallel.forEach,但是这会引发以下异常:
System.IO.Compression.dll中出现“System.IO.InvalidDataException”类型的异常,但未在用户代码中处理
其他信息:本地文件头已损坏。
Parallel.ForEach(allZipEntries, currentEntry =>
{
if (currentEntry.FullName.Equals(currentEntry.Name))
{
currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
}
else
{
var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
Directory.CreateDirectory(subDirectoryPath);
currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
}
});
为了避免这种情况,我可以使用锁,但这会破坏整个目的。
Parallel.ForEach(allZipEntries, currentEntry =>
{
lock (thisLock)
{
if (currentEntry.FullName.Equals(currentEntry.Name))
{
currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
}
else
{
var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
Directory.CreateDirectory(subDirectoryPath);
currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
}
}
});
提取文件还有其他更好的方法吗?
答案 0 :(得分:2)
。页面上不再提及此内容。 Snapshot from Nov 2016。 ZipFile
is explicitly documented as not guaranteed to be threadsafe for instance members
使用此库无法完成您要做的事情。 可能是其他一些库,每个zip文件支持多个线程,但我不指望它。
您可以使用多线程同时解压缩多个文件,但不用于同一个zip文件中的多个条目。
答案 1 :(得分:1)
并行写入/读取并不是一个好主意,因为硬盘驱动器控制器只会逐个运行请求。通过拥有多个线程,您只需添加开销并将它们排队等等,无法获得收益。
首先尝试将文件读入内存,这样可以避免您的异常,但是如果您对它进行基准测试,您可能会发现它实际上因为更多线程的开销而变慢。
如果文件非常大并且解压缩需要很长时间,则并行运行解压缩可以提高速度,但IO读/写不会。大多数解压缩库无论如何都已经是多线程的,因此只有当这个解压缩库不存在时,才能从中获得任何性能提升。
编辑:在下面使库线程安全的一种狡猾的方法。根据zip存档,它运行速度较慢/相同,这证明了这不会受益于并行性
Array.ForEach(Directory.GetFiles(@"c:\temp\output\"), File.Delete);
Stopwatch timer = new Stopwatch();
timer.Start();
int numberOfThreads = 8;
var clonedZipEntries = new List<ReadOnlyCollection<ZipArchiveEntry>>();
for (int i = 0; i < numberOfThreads; i++)
{
clonedZipEntries.Add(ZipFile.Open(@"c:\temp\temp.zip", ZipArchiveMode.Read).Entries);
}
int totalZipEntries = clonedZipEntries[0].Count;
int numberOfEntriesPerThread = totalZipEntries / numberOfThreads;
Func<object,int> action = (object thread) =>
{
int threadNumber = (int)thread;
int startIndex = numberOfEntriesPerThread * threadNumber;
int endIndex = startIndex + numberOfEntriesPerThread;
if (endIndex > totalZipEntries) endIndex = totalZipEntries;
for (int i = startIndex; i < endIndex; i++)
{
Console.WriteLine($"Extracting {clonedZipEntries[threadNumber][i].Name} via thread {threadNumber}");
clonedZipEntries[threadNumber][i].ExtractToFile($@"C:\temp\output\{clonedZipEntries[threadNumber][i].Name}");
}
//Check for any remainders due to non evenly divisible size
if (threadNumber == numberOfThreads - 1 && endIndex < totalZipEntries)
{
for (int i = endIndex; i < totalZipEntries; i++)
{
Console.WriteLine($"Extracting {clonedZipEntries[threadNumber][i].Name} via thread {threadNumber}");
clonedZipEntries[threadNumber][i].ExtractToFile($@"C:\temp\output\{clonedZipEntries[threadNumber][i].Name}");
}
}
return 0;
};
//Construct the tasks
var tasks = new List<Task<int>>();
for (int threadNumber = 0; threadNumber < numberOfThreads; threadNumber++) tasks.Add(Task<int>.Factory.StartNew(action, threadNumber));
Task.WaitAll(tasks.ToArray());
timer.Stop();
var threaderTimer = timer.ElapsedMilliseconds;
Array.ForEach(Directory.GetFiles(@"c:\temp\output\"), File.Delete);
timer.Reset();
timer.Start();
var entries = ZipFile.Open(@"c:\temp\temp.zip", ZipArchiveMode.Read).Entries;
foreach (var entry in entries)
{
Console.WriteLine($"Extracting {entry.Name} via thread 1");
entry.ExtractToFile($@"C:\temp\output\{entry.Name}");
}
timer.Stop();
Console.WriteLine($"Threaded version took: {threaderTimer} ms");
Console.WriteLine($"Non-Threaded version took: {timer.ElapsedMilliseconds} ms");
Console.ReadLine();