I have been looking at Entity Framework performance, particularly around the use of Includes and the time taken to both generate and execute the various queries.
I am going to detail changes I have made, but please correct me if you think any of these assumptions are wrong.
Firstly we have around 10,000 items (not many) in a db and the database is significantly normalised (which results in a significant number of navigation properties). Currently the approach is to lazy load everything and given that requesting one item can spool off tens of db requests, the performance is quite poor, particularly for larger sets of data. (This is an inherited project and step one is trying to improve performance without significant restructuring)
So my first step was to take the results of a query and then apply the Includes for the navigation properties only to those results. I know this technically performs 2 queries, but if we have 10,000 items stored, but only want to return 10 items, it makes more sense to only include the navigation properties on those 10 items.
Secondly, where multiple includes are used on a query result and that result set size is quite large, it still suffered from poor performance. I have been pragmatic about when to eager load and when to leave the lazy loading in place. My next change was to load query includes in batches, so performing:
query.Include(q => q.MyInclude).Load();
This once again significantly improved performance, although a few more db calls (one for each batch of includes) it was quicker than a large query or at the very least reduced the overhead of of Entity Framework trying to produce that large query.
So the code now looks something like this:
var query = ctx.Filters.Where(x => x.SessionId == id)
.Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);
query
.Include(x => x.ItemNav1)
.Include(x => x.ItemNav2).Load();
query
.Include(x => x.ItemNav3)
.Include(x => x.ItemNav4).Load();
query
.Include(x => x.ItemNav5)
.Include(x => x.ItemNav6).Load();
Now this is reasonably performant, however, it would be nice to improve this further.
I had considered using LoadAsync()
, which after a bit more refactoring would be possible and it would better fit with the rest of the architecture.
However, you can only execute one query at a time on a db context. So I was wondering if there was anyway to possible create a new db context, perform LoadAsync()
on each group of navigation properties (asynchronously) and then concatenate all of the results.
I know technically how you might create a new context, fire off a LoadAsync()
for each navigation group, but not how to concatenate the results, I don't know if it is definitely possible or whether it goes against good practice.
So my question is; is this possible or, is there another way I can further improve performance? I'm trying to stick with what Entity Framework provides rather than crafting some stored procs. Thanks
UPDATE
Regarding the performance disparity I'm seeing between using all Includes in one statement and Loading these in small groups. When running a query that returns 6000 items. (Using SQL profiler and VS diagnostics to determine times)
Grouped Includes: In total takes ~8 seconds to execute the includes.
Includes in one statement: SQL query is taking ~30 seconds to load. (Often getting timeouts)
After a bit more investigation, I don't think there is much overhead when EF converts the sql results to models. However we have seen nearly 500ms taken for EF to generate complex queries, which isn't ideal, but I'm not sure this can be resolved
UPDATE 2
With Ivan's help and following this https://msdn.microsoft.com/en-gb/data/hh949853.aspx we were able to improve things further, particularly using SelectMany
. I would highly recommend the msdn article to anyone attempting to improve their EF performance.
答案 0 :(得分:7)
您的第二种方法依赖于EF导航属性修复过程。问题是每个
query.Include(q => q.ItemNavN).Load();
语句还将包括所有主记录数据以及相关实体数据。
使用相同的基本思想,一个潜在的改进可能是每个导航属性执行一个Load
,将Include
替换为Select
(用于参考)或{{1} (对于集合) - 类似于EF Core在内部处理SelectMany
的方式。
采用第二种方法示例,您可以尝试以下方法并比较性能:
Include
答案 1 :(得分:1)
我知道这在技术上会执行2个查询,但如果我们存储了10,000个项目,但只想返回10个项目,那么仅在这10个项目中包含导航属性更有意义。
我认为您误解了.Include运算符的工作原理。在下面的代码中,DB只返回我们想要的项目,不会有"额外的数据"。
ctx.Items.Include(e => e.ItemNav1)
.Include(e => e.ItemNav2)
.Include(e => e.ItemNav3)
.Include(e => e.ItemNav4)
.Include(e => e.ItemNav5)
.Include(e => e.ItemNav6)
.Where(<filter criteria>)
.ToList();
如果只有10个项目符合过滤条件,则只会返回这些项目的数据。在幕后,.Include大致类似于SQL JOIN。仍然存在性能方面的考虑因素,但实际上没有任何理由(我知道)可以避免使用这种标准语法。
如果连接导致性能问题,可能问题是您的数据库。你有合适的索引吗?它们是否支离破碎?
答案 2 :(得分:1)
对于每个来这里的人,我希望您了解以下两件事:
.Select(x => x.NavProp).Load()实际上不会加载导航属性(如果您已关闭跟踪)。
从3.0.0版开始,每个Include都将向关系提供程序生成的SQL查询中添加一个附加的JOIN,而以前的版本会生成附加的SQL查询。无论好坏,这都会大大改变查询的性能。特别是,可能需要将包含极多的Include运算符的LINQ查询分解为多个单独的LINQ查询,以避免出现笛卡尔爆炸问题。
两个语句的来源:https://docs.microsoft.com/en-us/ef/core/querying/related-data
因此EF Core在backgruond中确实执行Select和SelectMany是不正确的。在我的案例中,我们有一个实体,其中包含导航属性,并且包含“包含”实际上加载了15,000行(是的,这是正确的,我称之为笛卡尔爆炸问题)。 在将代码重构为可以与Select / SelectMany一起使用之后,该行数减少到118。即使我们只有20个包含,查询时间也从4s减少到1秒以下。
希望这对某人有帮助,感谢Ivan。
答案 3 :(得分:1)
有很多方法可以提高性能。
我会在这里放一些,您可以尝试每一个,看看谁给您最好的结果。
您可以使用System.Diagnostics.StopWatch来获取经过的执行时间。
1。。索引缺失(例如,在外键上)
2。。在数据库的视图中编写查询,这很便宜。您也可以为此查询创建索引视图。
3。。尝试在单独的查询中加载数据:
context.Configuration.LazyLoadingEnabled = false;
context.ContactTypes.Where(c => c.ContactID== contactId).Load();
context.ContactConnections.Where(c => c.ContactID== contactId).Load();
return context.Contacts.Find(contactId);
这会将所有必需的数据加载到上下文的缓存中。 重要提示:请关闭延迟加载,因为在实体状态管理器中未将子集合标记为已加载,并且当您要访问它们时,EF会尝试触发延迟加载。
4。。将包含替换为 Select()。Load():
var query = ctx.Users.Where(u => u.UserID== userId)
.Join(ctx.Persons, p => p.PersonID, us => us.PersonID, (pr, ur) => ur);
query.Select(x => x.PersonIdentities).Load();
query.Select(x => x.PersonDetails).Load();
var result = query.ToList();
请记住:打开跟踪以加载导航属性。
5。。对于多个调用,单独包含,每个调用限制为2个包含,然后循环以连接对象属性。
以下是单个对象获取的示例:
var contact= from c in db.Contacts
.Include(p=>p.ContactTypes)
.Include(p=>p.ContactConnections)
.FirstOrDefault();
var contact2= from c in db.Contacts
.Include(p=>p.ContactIdentities)
.Include(p=>p.Person)
.FirstOrDefault();
contact.ContactIdentities = contact2.ContactIdentities ;
contact.Person= contact2.Person;
return contact.