Question

我想处理太多文本数据，然后将其保存到zip存档中的硬盘驱动器中。由于治疗应该是多线程的，因此任务变得复杂。

...
ZipSaver saver = new ZipSaver(10000); // 10000 - is the number of items when necessary to save the file to hard drive
Parallel.ForEach(source, item => {
    string workResult = ModifyItem(item);
    saver.AddItem(workResult);
});

ZipSaver类的一部分（使用库Ionic ZipFile）

private ConcurrentQueue<ZipFile> _pool;
public void AddItem(string src){
    ZipFile currentZipFile;
    if(_pool.TryDequeue(out currentZipFile) == false){
        currentZipFile = InitNewZipFile(); // 
    }
    currentZipFile.AddEntry(path, src); // f the pool is not available archives, create a new one
    // if after an item is added to the archive, you have reached the maximum number of elements,
    // specified in the constructor, save this file to your hard drive,
    // else return the archive into a common pool
    if(currentZipFile.Enties.Count > _maxEntries){
        SaveZip(currentZipFile);
    }else{
        _pool.Enqueue(currentZipFile);
    }
}

当然，我可以使用存档中最大项目数，但这取决于输出文件的大小，理想情况下应该配置。现在很多集合的项目，在循环中处理，创建许多线程，实用，每个都有自己的“ZipFile”实例，导致RAM溢出。如何改善保护机制？抱歉我的英语=）

Answer 1

如何限制并发线程数，这将限制队列中ZipFile个实例的数量。例如：

Parallel.ForEach(source, 
    new ParallelOptions { MaxDegreeOfParallelism = 3 },
    item => 
    {
        string workResult = ModifyItem(item);
        saver.AddItem(workResult);
    });

也可能是10,000件物品太多了。如果您添加的文件大小均为1兆字节，那么其中10,000个将创建一个10千兆字节的文件。这可能会让你的内存耗尽。

您需要按大小而不是按文件数限制zip文件。我不知道DotNetZip是否会让你看到输出缓冲区中当前有多少字节。如果不出意外，您可以通过计算未压缩的字节来估计压缩比并使用它来限制大小。也就是说，如果您希望压缩率达到50％，并且希望将输出文件大小限制为1千兆字节，那么您需要将总输入限制为2千兆字节（即1 gb/0.5 = 2 gb）

如果您能看到当前的输出大小，那将是最好的。我不熟悉DotNetZip，所以我不能说它是否具备这种能力。

如何改进我的算法在硬盘上存储数据？

1 个答案: