Question

SemaphoreSlim sm = new SemaphoreSlim(10);

using (FileStream fileStream = File.OpenRead("..."))
using (StreamReader streamReader = new StreamReader(fileStream, Encoding.UTF8, true, 4096))
{
    String line;
    while ((line = streamReader.ReadLine()) != null)
    {
        sm.Wait();
        new Thread(() =>
        {
            doSomething(line);
            sm.Release();
        }).Start();
    }
}
MessageBox.Show("This should only show once doSomething() has done its LAST line.");

所以，我有一个非常大的文件，我想在每一行上执行代码。

我想在并行中进行，但一次最多只能进行10次。

我的解决方案是在线程完成时使用SemaphoreSlim等待并释放。（由于函数是同步的，因此放置.Release（）可以工作）。

问题是代码占用了大量的CPU。内存正如预期的那样，而不是超过400mb的负载，它每隔几秒就会上下几个mbs。

但是CPU变得疯狂，它的大部分时间都锁定在100％，持续了30秒，然后稍微下降并返回。

由于我不想将每一行加载到内存中，并且想要按原样运行代码，这里最好的解决方案是什么？

在<9,700行文件中

500行。

在270万行文件中输入600行。

修改

我根据评论中提到的从new Thread(()=>{}).Start();更改为Task.Factory.StartNew(()=>{});，似乎线程创建和销毁导致性能下降。它似乎是正确的。在我转移到Task.Factory.StartNew之后，它运行与Semaphore提到的相同的速度，它的CPU就像我的Parallel.ForEach代码版本。

Answer 1

您的代码会创建大量线程，效率很低。 C＃有更简单的方法来处理您的场景。一种方法是：

var imageScraper = new imagescraper();
var images;

Meteor.methods({
    scrapeImgs(url){
        imageScraper.on('image', (image) => {
           images.push(image);
        });

        images = [];
        imageScraper.address = url;
        imageScraper.scrape();

        imageScraper.on('end', () => {
            return images; //does not work
        });

        return images; // returns an empty array
    },
});

File.ReadLines不读取整个文件，它逐行读取。
使用WithDegreeOfParallelism设置最大并发执行任务数
使用ForAll在每一行上启动方法。

C＃ - 使用StreamReader进行循环并行化导致高CPU

500行。

在270万行文件中输入600行。

修改

1 个答案: