Question

我有一个我正在尝试解析的文件，这就是我正在做的事情：

var definitions = new Dictionary<int, string>();

foreach (var line in new RirStatFile("delegated-lacnic-latest.txt"))
{
    for (var i = 0; i < line.Range; i ++)
    {
        definitions[line.StartIpAddress + i] = line.Iso3166CountryCode;
    };
};

new RirStatFile(...)返回IEnumerable<RirStatFileLine>()个.Count个{1,1} RirStatFileLine个对象，其中每个RirStatFileLine都有一个.Range，其值通常在32768和100万。

在上面的代码片段中演示了这个在我这个可怜的上网本上大约需要45秒。

编辑：双核上网本。

使用新的Parallel任务库的好地方，对吧？这就是我的想法，所以我将代码更改为：

var definitions = new ConcurrentDictionary<int, string>();

Parallel.ForEach(new RirStatFile("delegated-lacnic-latest.txt"), line => 
{
    Parallel.For(0, line.Range, i =>
    {
        definitions[line.StartIpAddress + i] = line.Iso3166CountryCode;
    });
});

猜猜是什么？该计划需要200秒！

是什么给出的？显然我不明白这里发生了什么。仅供参考，这里是RirStatFileLine：

public class RirStatFileLine
{
    public readonly string Iso3166CountryCode;
    public readonly int StartIpAddress;
    public readonly int Range;

    public RirStatFileLine(string line)
    {
        var segments = line.Split('|');

        // Line:         
        //    lacnic|BR|ipv4|143.54.0.0|65536|19900828|assigned
        // Translation:
        //    rir_name|ISO_countryCode|ipVersion|ipAddress|range|dateStamp|blah

        this.Iso3166CountryCode = segments[1];
        this.StartIpAddress =
         BitConverter.ToInt32(IPAddress.Parse(segments[3]).GetAddressBytes(), 0);
        this.Range = int.Parse(segments[4]);
    }
}

RirStatFile：

public class RirStatFile : IEnumerable<RirStatFileLine>
{
    private const int headerLineLength = 4;

    private readonly IEnumerable<RirStatFileLine> lines;

    public RirStatFile(string filepath)
    {
        this.lines = File.ReadAllLines(filepath)
                         .Skip(RirStatFile.headerLineLength)
                         .Select(line => new RirStatFileLine(line)); 
    }

    public IEnumerator<RirStatFileLine> GetEnumerator()
    {
        return this.lines.GetEnumerator();
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.lines.GetEnumerator();
    }
}

Answer 1

这里不足为奇。您正在进行一些非常便宜的操作（在字典中添加一个条目）并将其包含在一些昂贵的并行化代码中。

你应该将计算成本高昂的代码并行化，而不是简单的代码。

此外，您使用ReadAllLines代替ReadLines，因此没有机会在阅读文件时重叠任何处理。

MSDN “ReadLines和ReadAllLines方法的不同之处如下：当您使用ReadLines时，您可以在返回整个集合之前开始枚举字符串集合;当您使用ReadAllLines时，您必须等待整个数组在您访问数组之前返回字符串。因此，当您使用非常大的文件时，ReadLines可以更有效。“

Answer 2

这里的问题是你的上网本只有一个CPU /核心/硬件线程。瘫痪在这里根本没用。

ConcurrentDictionary：超级慢？

2 个答案: