Parallel.ForEach在解压缩zip文件时抛出异常

时间:2017-05-23 01:20:04

标签: c# foreach zip task-parallel-library parallel.foreach

我正在阅读zip文件的内容并尝试提取它们。

  var allZipEntries = ZipFile.Open(zipFileFullPath, ZipArchiveMode.Read).Entries;

现在,如果我提取使用Foreach循环,这可以正常工作。缺点是它相当于zip.extract方法,并且在打算提取所有文件时我没有任何优势。

   foreach (var currentEntry in allZipEntries)
        {
            if (currentEntry.FullName.Equals(currentEntry.Name))
            {
                currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
            }
            else
            {
                var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
                Directory.CreateDirectory(subDirectoryPath);
                currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
            }

        }

现在利用TPL尝试使用Parallel.forEach,但是这会引发以下异常:

  

System.IO.Compression.dll中出现“System.IO.InvalidDataException”类型的异常,但未在用户代码中处理

     

其他信息:本地文件头已损坏。

  Parallel.ForEach(allZipEntries, currentEntry =>
        {
            if (currentEntry.FullName.Equals(currentEntry.Name))
            {
                currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
            }
            else
            {
                var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
                Directory.CreateDirectory(subDirectoryPath);
                currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
            }

        });

为了避免这种情况,我可以使用锁,但这会破坏整个目的。

        Parallel.ForEach(allZipEntries, currentEntry =>
        {
            lock (thisLock)
            {
                if (currentEntry.FullName.Equals(currentEntry.Name))
                {
                    currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
                }
                else
                {
                    var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
                    Directory.CreateDirectory(subDirectoryPath);
                    currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
                }
            }

        });

提取文件还有其他更好的方法吗?

2 个答案:

答案 0 :(得分:2)

ZipFile is explicitly documented as not guaranteed to be threadsafe for instance members 。页面上不再提及此内容。 Snapshot from Nov 2016

使用此库无法完成您要做的事情。 可能是其他一些库,每个zip文件支持多个线程,但我不指望它。

您可以使用多线程同时解压缩多个文件,但用于同一个zip文件中的多个条目。

答案 1 :(得分:1)

并行写入/读取并不是一个好主意,因为硬盘驱动器控制器只会逐个运行请求。通过拥有多个线程,您只需添加开销并将它们排队等等,无法获得收益。

首先尝试将文件读入内存,这样可以避免您的异常,但是如果您对它进行基准测试,您可能会发现它实际上因为更多线程的开销而变慢。

如果文件非常大并且解压缩需要很长时间,则并行运行解压缩可以提高速度,但IO读/写不会。大多数解压缩库无论如何都已经是多线程的,因此只有当这个解压缩库不存在时,才能从中获得任何性能提升。

编辑:在下面使库线程安全的一种狡猾的方法。根据zip存档,它运行速度较慢/相同,这证明了这不会受益于并行性

Array.ForEach(Directory.GetFiles(@"c:\temp\output\"), File.Delete);

Stopwatch timer = new Stopwatch();
timer.Start();
int numberOfThreads = 8;
var clonedZipEntries = new List<ReadOnlyCollection<ZipArchiveEntry>>();

for (int i = 0; i < numberOfThreads; i++)
{
    clonedZipEntries.Add(ZipFile.Open(@"c:\temp\temp.zip", ZipArchiveMode.Read).Entries);
}
int totalZipEntries = clonedZipEntries[0].Count;
int numberOfEntriesPerThread = totalZipEntries / numberOfThreads;

Func<object,int> action = (object thread) =>
{
    int threadNumber = (int)thread;
    int startIndex = numberOfEntriesPerThread * threadNumber;
    int endIndex = startIndex + numberOfEntriesPerThread;
    if (endIndex > totalZipEntries) endIndex = totalZipEntries;

    for (int i = startIndex; i < endIndex; i++)
    {
        Console.WriteLine($"Extracting {clonedZipEntries[threadNumber][i].Name} via thread {threadNumber}");
        clonedZipEntries[threadNumber][i].ExtractToFile($@"C:\temp\output\{clonedZipEntries[threadNumber][i].Name}");
    }

    //Check for any remainders due to non evenly divisible size
    if (threadNumber == numberOfThreads - 1 && endIndex < totalZipEntries)
    {
        for (int i = endIndex; i < totalZipEntries; i++)
        {
            Console.WriteLine($"Extracting {clonedZipEntries[threadNumber][i].Name} via thread {threadNumber}");
            clonedZipEntries[threadNumber][i].ExtractToFile($@"C:\temp\output\{clonedZipEntries[threadNumber][i].Name}");
        }
    }
    return 0;
};


//Construct the tasks
var tasks = new List<Task<int>>();
for (int threadNumber = 0; threadNumber < numberOfThreads; threadNumber++) tasks.Add(Task<int>.Factory.StartNew(action, threadNumber));

Task.WaitAll(tasks.ToArray());
timer.Stop();

var threaderTimer = timer.ElapsedMilliseconds;



Array.ForEach(Directory.GetFiles(@"c:\temp\output\"), File.Delete);

timer.Reset();
timer.Start();
var entries = ZipFile.Open(@"c:\temp\temp.zip", ZipArchiveMode.Read).Entries;
foreach (var entry in entries)
{
    Console.WriteLine($"Extracting {entry.Name} via thread 1");
    entry.ExtractToFile($@"C:\temp\output\{entry.Name}");
}
timer.Stop();

Console.WriteLine($"Threaded version took: {threaderTimer} ms");
Console.WriteLine($"Non-Threaded version took: {timer.ElapsedMilliseconds} ms");


Console.ReadLine();