优化LINQ将多个列表组合成新的通用列表

时间:2017-06-05 21:41:52

标签: c# linq

鉴于以下三个清单:

    var FirstNames = new List<string>(){ "Bob", "Sondra", "Avery", "Von", "Randle", "Gwen", "Paisley" };
    var LastNames = new List<string>(){ "Anderson", "Carlson", "Vickers", "Black", "Schultz", "Marigold", "Johnson" };
    var Birthdates = new List<DateTime>()
                    { 
                        Convert.ToDateTime("11/12/1980"), 
                        Convert.ToDateTime("09/16/1978"), 
                        Convert.ToDateTime("05/18/1985"), 
                        Convert.ToDateTime("10/29/1980"), 
                        Convert.ToDateTime("01/19/1989"), 
                        Convert.ToDateTime("01/14/1972"), 
                        Convert.ToDateTime("02/20/1981") 
                    };

我希望将它们组合成一个新的通用类型,其中列表共享的关系是它们在集合中的位置。即,FirstNames [0],LastNames [0],Birthdates [0]是相关的。

所以我想出了这个LINQ,匹配索引,现在似乎工作正常:

    var students = from fn in FirstNames
                   from ln in LastNames
                   from bd in Birthdates
                   where FirstNames.IndexOf(fn) == LastNames.IndexOf(ln)
                   where FirstNames.IndexOf(fn) == Birthdates.IndexOf(bd)
                   select new { First = fn, Last = ln, Birthdate = bd.Date };

但是,我已经强调测试了这段代码(每个List<string>List<DateTime>加载了几百万条记录),我遇到SystemOutOfMemory异常。

有没有其他方法可以使用Linq更有效地写出此查询以获得相同的结果?

3 个答案:

答案 0 :(得分:3)

这就是Zip的用途。

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

关于缩放:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // Returns 69191 on my machine.

虽然这些内存耗尽但

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var FirstNamesList = FirstNames.ToList(); // Blows up in 32-bit .NET with out of Memory
var LastNamesList = LastNames.ToList();
var BirthDatesList = BirthDates.ToList();

var result = Enumerable.Range(0, FirstNamesList.Count())
    .Select(i => new 
                 { 
                     First = FirstNamesList[i], 
                     Last = LastNamesList[i], 
                     Birthdate = BirthDatesList[i] 
                 });

result = BirthDatesList.Select((bd, i) => new
{ 
    First = FirstNamesList[i], 
    Last = LastNamesList[i], 
    BirthDate = bd 
});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

在较低的值下,将Enumerables转换为List的成本比创建其他对象的成本高得多。 Zip比索引版本快约30%。随着您添加更多列,Zips优势可能会缩小。

性能特征也非常不同。 Zip例程将几乎立即开始输出答案,而其他人只有在读取完整个Enumerables并转换为Lists后才会开始输出答案,因此如果您使用.Skip(x).Take(y)获取结果并对其进行分页,或检查如果存在某些内容.Any(...),它将会更快,因为它不必转换整个可枚举项。

最后,如果它变得性能至关重要,并且你需要实现许多结果,你可以考虑扩展zip以处理任意数量的Enumerables(从Jon Skeet无耻地窃取 - https://codeblog.jonskeet.uk/2011/01/14/reimplementing-linq-to-objects-part-35-zip/):

private static IEnumerable<TResult> Zip<TFirst, TSecond, TThird, TResult>( 
    IEnumerable<TFirst> first, 
    IEnumerable<TSecond> second,
    IEnumerable<TThird> third, 
    Func<TFirst, TSecond, TThird, TResult> resultSelector) 
{ 
    using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
    using (IEnumerator<TSecond> iterator2 = second.GetEnumerator()) 
    using (IEnumerator<TThird> iterator3 = third.GetEnumerator()) 
    { 
        while (iterator1.MoveNext() && iterator2.MoveNext() && iterator3.MoveNext()) 
        { 
            yield return resultSelector(iterator1.Current, iterator2.Current, iterator3.Current); 
        } 
    } 
}

然后你可以这样做:

var result = FirstNames
  .Zip(LastNames, BirthDates, (f,l,b) => new {First=f,Last=l,BirthDate=b});

现在你甚至没有创建中间对象的问题,所以你可以获得最好的世界。

或者使用此处的实现来处理任何数字:Zip multiple/abitrary number of enumerables in C#

答案 1 :(得分:3)

另一种选择是在提供索引器的情况下使用Select重载:

var result = Birthdates.Select((bd, i) => new
{ 
    First = FirstNames[i], 
    Last = LastNames[i], 
    Birthdate = bd 
});

答案 2 :(得分:2)

Yeap,使用范围生成器:

var result = Enumerable.Range(0, FirstNames.Count)
    .Select(i => new 
                 { 
                     First = FirstNames[i], 
                     Last = LastNames[i], 
                     Birthdate = Birthdates[i] 
                 });