鉴于以下三个清单:
var FirstNames = new List<string>(){ "Bob", "Sondra", "Avery", "Von", "Randle", "Gwen", "Paisley" };
var LastNames = new List<string>(){ "Anderson", "Carlson", "Vickers", "Black", "Schultz", "Marigold", "Johnson" };
var Birthdates = new List<DateTime>()
{
Convert.ToDateTime("11/12/1980"),
Convert.ToDateTime("09/16/1978"),
Convert.ToDateTime("05/18/1985"),
Convert.ToDateTime("10/29/1980"),
Convert.ToDateTime("01/19/1989"),
Convert.ToDateTime("01/14/1972"),
Convert.ToDateTime("02/20/1981")
};
我希望将它们组合成一个新的通用类型,其中列表共享的关系是它们在集合中的位置。即,FirstNames [0],LastNames [0],Birthdates [0]是相关的。
所以我想出了这个LINQ,匹配索引,现在似乎工作正常:
var students = from fn in FirstNames
from ln in LastNames
from bd in Birthdates
where FirstNames.IndexOf(fn) == LastNames.IndexOf(ln)
where FirstNames.IndexOf(fn) == Birthdates.IndexOf(bd)
select new { First = fn, Last = ln, Birthdate = bd.Date };
但是,我已经强调测试了这段代码(每个List<string>
和List<DateTime>
加载了几百万条记录),我遇到SystemOutOfMemory
异常。
有没有其他方法可以使用Linq更有效地写出此查询以获得相同的结果?
答案 0 :(得分:3)
这就是Zip的用途。
var result = FirstNames
.Zip(LastNames, (f,l) => new {f,l})
.Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});
关于缩放:
int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));
var sw = new Stopwatch();
sw.Start();
var result = FirstNames
.Zip(LastNames, (f,l) => new {f,l})
.Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});
foreach(var r in result)
{
var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // Returns 69191 on my machine.
虽然这些内存耗尽但
int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));
var sw = new Stopwatch();
sw.Start();
var FirstNamesList = FirstNames.ToList(); // Blows up in 32-bit .NET with out of Memory
var LastNamesList = LastNames.ToList();
var BirthDatesList = BirthDates.ToList();
var result = Enumerable.Range(0, FirstNamesList.Count())
.Select(i => new
{
First = FirstNamesList[i],
Last = LastNamesList[i],
Birthdate = BirthDatesList[i]
});
result = BirthDatesList.Select((bd, i) => new
{
First = FirstNamesList[i],
Last = LastNamesList[i],
BirthDate = bd
});
foreach(var r in result)
{
var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
在较低的值下,将Enumerables转换为List的成本比创建其他对象的成本高得多。 Zip比索引版本快约30%。随着您添加更多列,Zips优势可能会缩小。
性能特征也非常不同。 Zip例程将几乎立即开始输出答案,而其他人只有在读取完整个Enumerables并转换为Lists后才会开始输出答案,因此如果您使用.Skip(x).Take(y)
获取结果并对其进行分页,或检查如果存在某些内容.Any(...)
,它将会更快,因为它不必转换整个可枚举项。
最后,如果它变得性能至关重要,并且你需要实现许多结果,你可以考虑扩展zip以处理任意数量的Enumerables(从Jon Skeet无耻地窃取 - https://codeblog.jonskeet.uk/2011/01/14/reimplementing-linq-to-objects-part-35-zip/):
private static IEnumerable<TResult> Zip<TFirst, TSecond, TThird, TResult>(
IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
IEnumerable<TThird> third,
Func<TFirst, TSecond, TThird, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
using (IEnumerator<TThird> iterator3 = third.GetEnumerator())
{
while (iterator1.MoveNext() && iterator2.MoveNext() && iterator3.MoveNext())
{
yield return resultSelector(iterator1.Current, iterator2.Current, iterator3.Current);
}
}
}
然后你可以这样做:
var result = FirstNames
.Zip(LastNames, BirthDates, (f,l,b) => new {First=f,Last=l,BirthDate=b});
现在你甚至没有创建中间对象的问题,所以你可以获得最好的世界。
或者使用此处的实现来处理任何数字:Zip multiple/abitrary number of enumerables in C#
答案 1 :(得分:3)
另一种选择是在提供索引器的情况下使用Select重载:
var result = Birthdates.Select((bd, i) => new
{
First = FirstNames[i],
Last = LastNames[i],
Birthdate = bd
});
答案 2 :(得分:2)
Yeap,使用范围生成器:
var result = Enumerable.Range(0, FirstNames.Count)
.Select(i => new
{
First = FirstNames[i],
Last = LastNames[i],
Birthdate = Birthdates[i]
});