说我有这种假设的多对多关系:
public class Paper
{
public int Id { get; set; }
public string Title { get; set; }
public virtual ICollection<Author> Authors { get; set; }
}
public class Author
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Paper> Papers { get; set; }
}
我想使用LINQ来构建一个查询,这个查询会让我对每个作者的“受欢迎程度”与其他作者相比,这是作者贡献的论文数量除以所有作者贡献的总数。文件。为了达到这个目的,我想出了几个问题。
选项1:
var query1 = from author in db.Authors
let sum = (double)db.Authors.Sum(a => a.Papers.Count)
select new
{
Author = author,
Popularity = author.Papers.Count / sum
};
选项2:
var temp = db.Authors.Select(a => new
{
Auth = a,
Contribs = a.Papers.Count
});
var query2 = temp.Select(a => new
{
Author = a,
Popularity = a.Contribs / (double)temp.Sum(a2 => a2.Contribs)
});
基本上,我的问题是:哪些更有效,还有其他单一查询更有效吗?如何将这些查询与两个单独的查询进行比较,如下所示:
double sum = db.Authors.Sum(a => a.Papers.Count);
var query3 = from author in db.Authors
select new
{
Author = author,
Popularity = author.Papers.Count / sum
};
答案 0 :(得分:0)
嗯,首先,你可以自己尝试一下,看看哪一个花费时间最长。
你应该首先考虑的是,它们完全转换为SQL或尽可能接近,以便数据不会全部加载到内存中,只是为了应用这些计算。
但是我觉得选项2可能是你最好的镜头,还有一个优化来缓存贡献的页面总数。这样你只需要调用一次db来获取你需要的作者,其余的将在你的代码中运行,你可以在那里进行并行化并做任何你需要的东西来加快它。
所以这样的事情(对不起,我更喜欢写Linq的Fluent风格):
//here you can even load only the needed info if you don't need the whole entity.
//I imagine you might only need the name and the Pages.Count which you can use below, this would be another optimization.
var allAuthors = db.Authors.All();
var totalPageCount = allAuthors.Sum(x => x.Pages.Count);
var theEndResult = allAuthors .Select(a => new
{
Author = a,
Popularity = a.Pages.Count/ (double)totalPageCount
});
答案 1 :(得分:0)
选项1和2应生成相同的SQL代码。为了便于阅读,我会选择选项1 选项3将生成两个SQL语句并且速度稍慢。