实体框架具有多种聚合性能

时间:2016-02-16 15:26:14

标签: c# sql-server entity-framework sql-server-2008 entity-framework-6

我对实体框架查询构建有疑问。

模式

我有一个像这样的表结构:

public void add(Data item) {
        if(!params.contains(item){
           params.add(item);
           notifyItemInserted(getItemCount() - 1);
        }
    }
}

CREATE TABLE [dbo].[DataLogger]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [ProjectID] [bigint] NULL, CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC ) ) CREATE TABLE [dbo].[DCDistributionBox]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [DataLoggerID] [bigint] NOT NULL, CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC ) ) ALTER TABLE [dbo].[DCDistributionBox] ADD CONSTRAINT [FK_DCDistributionBox_DataLogger] FOREIGN KEY([DataLoggerID]) REFERENCES [dbo].[DataLogger] ([ID]) CREATE TABLE [dbo].[DCString] ( [ID] [bigint] IDENTITY(1,1) NOT NULL, [DCDistributionBoxID] [bigint] NOT NULL, [CurrentMPP] [decimal](18, 2) NULL, CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC ) ) ALTER TABLE [dbo].[DCString] ADD CONSTRAINT [FK_DCString_DCDistributionBox] FOREIGN KEY([DCDistributionBoxID]) REFERENCES [dbo].[DCDistributionBox] ([ID]) CREATE TABLE [dbo].[StringData]( [DCStringID] [bigint] NOT NULL, [TimeStamp] [datetime] NOT NULL, [DCCurrent] [decimal](18, 2) NULL, CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC) ) 表有以下存储统计信息:

  • 数据空间:26,901.86 MB
  • 行数:131,827,749
  • 分区:true
  • 分区数:62

用法

我现在想要对[StringData]表中的数据进行分组并进行一些聚合。

在纯SQL中,它看起来像这样:

[StringData]

超时时间: 653ms

现在我创建了一个实体框架等价物(我想):

declare @projectID bigint = 20686;
declare @from datetime = '06.02.2016';
declare @till datetime = '07.02.2016';
declare @interval int = 15;

SELECT 
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [StringData].[TimeStamp] ) / @interval * @interval, 0) AS [TimeStamp]
, AVG([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentAvg]
, MIN([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentMin]
, MAX([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentMax]
, STDEV([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentStDev]
, COUNT(*) AS [Count]

FROM [StringData]
JOIN [DCString] ON [DCString].[ID] = [StringData].[DCStringID]
JOIN [DCDistributionBox] ON [DCDistributionBox].[ID] = [DCString].[DCDistributionBoxID]
JOIN [DataLogger] ON [DataLogger].[ID] = [DCDistributionBox].[DataLoggerID]

WHERE [DataLogger].[ProjectID] = @projectID
AND [StringData].[TimeStamp] >= @from
AND [StringData].[TimeStamp] < @till

GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [StringData].[TimeStamp] ) / @interval * @interval, 0)

执行结果是超时(超过30秒)!?

的尝试

我现在看一下Entity Framework生成的SQL查询,看起来像这样:

var compareData = model.StringDatas
    AsNoTracking()
    .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= from && p.TimeStamp < till)                    
    .Select(d => new
    {
        TimeStamp = d.Key,
        DCCurrentMin = d.Min(v => v.DCCurrent / v.DCString.CurrentMPP),
        DCCurrentMax = d.Max(v => v.DCCurrent / v.DCString.CurrentMPP),
        DCCurrentAvg = d.Average(v => v.DCCurrent / v.DCString.CurrentMPP),
        DCCurrentStDev = DbFunctions.StandardDeviation(d.Select(v => v.DCCurrent / v.DCString.CurrentMPP))
    })
    .ToList();

问题

为什么Entity Framework会将每个聚合分成一个sngle子选择,如何避免这种情况以获得接近原始SQL查询的性能?

更新1

这与SQL查询输出和超时结果完全相同:

SELECT 
1 AS [C1], 
[Project10].[C1] AS [C2], 
[Project10].[C2] AS [C3], 
[Project10].[C3] AS [C4], 
[Project10].[C4] AS [C5], 
[Project10].[C5] AS [C6]
FROM ( SELECT 
    [Project8].[C1] AS [C1], 
    [Project8].[C2] AS [C2], 
    [Project8].[C3] AS [C3], 
    [Project8].[C4] AS [C4], 
    (SELECT 
        STDEV([Project9].[A1]) AS [A1]
        FROM ( SELECT 
            [Project9].[DCCurrent] / [Project9].[CurrentMPP] AS [A1]
            FROM ( SELECT 
                [Extent17].[DCStringID] AS [DCStringID], 
                [Extent17].[DCCurrent] AS [DCCurrent], 
                [Extent18].[ID] AS [ID], 
                [Extent18].[CurrentMPP] AS [CurrentMPP]
                FROM    [dbo].[StringData] AS [Extent17]
                INNER JOIN [dbo].[DCString] AS [Extent18] ON [Extent17].[DCStringID] = [Extent18].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent19] ON [Extent18].[DCDistributionBoxID] = [Extent19].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent20] ON [Extent19].[DataLoggerID] = [Extent20].[ID]
                WHERE (([Extent20].[ProjectID] = @p__linq__0) OR (([Extent20].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent17].[TimeStamp] >= @p__linq__1) AND ([Extent17].[TimeStamp] < @p__linq__2) AND (([Project8].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent17].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project8].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent17].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
            )  AS [Project9]
        )  AS [Project9]) AS [C5]
    FROM ( SELECT 
        [Project6].[C1] AS [C1], 
        [Project6].[C2] AS [C2], 
        [Project6].[C3] AS [C3], 
        (SELECT 
            AVG([Project7].[A1]) AS [A1]
            FROM ( SELECT 
                [Project7].[DCCurrent] / [Project7].[CurrentMPP] AS [A1]
                FROM ( SELECT 
                    [Extent13].[DCStringID] AS [DCStringID], 
                    [Extent13].[DCCurrent] AS [DCCurrent], 
                    [Extent14].[ID] AS [ID], 
                    [Extent14].[CurrentMPP] AS [CurrentMPP]
                    FROM    [dbo].[StringData] AS [Extent13]
                    INNER JOIN [dbo].[DCString] AS [Extent14] ON [Extent13].[DCStringID] = [Extent14].[ID]
                    INNER JOIN [dbo].[DCDistributionBox] AS [Extent15] ON [Extent14].[DCDistributionBoxID] = [Extent15].[ID]
                    INNER JOIN [dbo].[DataLogger] AS [Extent16] ON [Extent15].[DataLoggerID] = [Extent16].[ID]
                    WHERE (([Extent16].[ProjectID] = @p__linq__0) OR (([Extent16].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent13].[TimeStamp] >= @p__linq__1) AND ([Extent13].[TimeStamp] < @p__linq__2) AND (([Project6].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent13].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project6].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent13].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
                )  AS [Project7]
            )  AS [Project7]) AS [C4]
        FROM ( SELECT 
            [Project4].[C1] AS [C1], 
            [Project4].[C2] AS [C2], 
            (SELECT 
                MAX([Project5].[A1]) AS [A1]
                FROM ( SELECT 
                    [Project5].[DCCurrent] / [Project5].[CurrentMPP] AS [A1]
                    FROM ( SELECT 
                        [Extent9].[DCStringID] AS [DCStringID], 
                        [Extent9].[DCCurrent] AS [DCCurrent], 
                        [Extent10].[ID] AS [ID], 
                        [Extent10].[CurrentMPP] AS [CurrentMPP]
                        FROM    [dbo].[StringData] AS [Extent9]
                        INNER JOIN [dbo].[DCString] AS [Extent10] ON [Extent9].[DCStringID] = [Extent10].[ID]
                        INNER JOIN [dbo].[DCDistributionBox] AS [Extent11] ON [Extent10].[DCDistributionBoxID] = [Extent11].[ID]
                        INNER JOIN [dbo].[DataLogger] AS [Extent12] ON [Extent11].[DataLoggerID] = [Extent12].[ID]
                        WHERE (([Extent12].[ProjectID] = @p__linq__0) OR (([Extent12].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent9].[TimeStamp] >= @p__linq__1) AND ([Extent9].[TimeStamp] < @p__linq__2) AND (([Project4].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent9].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project4].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent9].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
                    )  AS [Project5]
                )  AS [Project5]) AS [C3]
            FROM ( SELECT 
                [Project2].[C1] AS [C1], 
                (SELECT 
                    MIN([Project3].[A1]) AS [A1]
                    FROM ( SELECT 
                        [Project3].[DCCurrent] / [Project3].[CurrentMPP] AS [A1]
                        FROM ( SELECT 
                            [Extent5].[DCStringID] AS [DCStringID], 
                            [Extent5].[DCCurrent] AS [DCCurrent], 
                            [Extent6].[ID] AS [ID], 
                            [Extent6].[CurrentMPP] AS [CurrentMPP]
                            FROM    [dbo].[StringData] AS [Extent5]
                            INNER JOIN [dbo].[DCString] AS [Extent6] ON [Extent5].[DCStringID] = [Extent6].[ID]
                            INNER JOIN [dbo].[DCDistributionBox] AS [Extent7] ON [Extent6].[DCDistributionBoxID] = [Extent7].[ID]
                            INNER JOIN [dbo].[DataLogger] AS [Extent8] ON [Extent7].[DataLoggerID] = [Extent8].[ID]
                            WHERE (([Extent8].[ProjectID] = @p__linq__0) OR (([Extent8].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent5].[TimeStamp] >= @p__linq__1) AND ([Extent5].[TimeStamp] < @p__linq__2) AND (([Project2].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent5].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project2].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent5].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
                        )  AS [Project3]
                    )  AS [Project3]) AS [C2]
                FROM ( SELECT 
                    [Distinct1].[C1] AS [C1]
                    FROM ( SELECT DISTINCT 
                        DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [C1]
                        FROM    [dbo].[StringData] AS [Extent1]
                        INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                        INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                        INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                        WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
                    )  AS [Distinct1]
                )  AS [Project2]
            )  AS [Project4]
        )  AS [Project6]
    )  AS [Project8]
)  AS [Project10]

2 个答案:

答案 0 :(得分:2)

在回答How do I get EF6 to generate efficient SQL containing mulitple aggregate columns?时,我注意到了这种行为(错误?)。

我能够解决问题的唯一方法是在group by操作之前引入一个临时投影 ,这同样适用于您的情况:

var query =
    from e in (from d in db.StringDatas.AsNoTracking()
               where d.DCString.DCDistributionBox.DataLogger.ProjectID == projectID
                   && d.TimeStamp >= fromDate && d.TimeStamp < tillDate
               select new { d, s = d.DCString })
    group e by DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, e.d.TimeStamp) / minuteInterval * minuteInterval) into g
    let ratio = g.Select(e => e.d.DCCurrent / e.s.CurrentMPP)
    select new
    {
        TimeStamp = g.Key,
        DCCurrentMin = ratio.Min(),
        DCCurrentMax = ratio.Max(),
        DCCurrentAvg = ratio.Average(),
        DCCurrentStDev = DbFunctions.StandardDeviation(ratio)
    };

EF生成的SQL:

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEV([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A1], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A2], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A3], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] AS [DCCurrent], 
                [Extent2].[CurrentMPP] AS [CurrentMPP]
                FROM    [dbo].[StringDatas] AS [Extent1]
                INNER JOIN [dbo].[DCStrings] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBoxes] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLoggers] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]

答案 1 :(得分:0)

尝试将所需的两个属性分组,而不是整个对象,如下所示:

group new { d.DCCurrent, d.DCString.CurrentMPP } 

然后将v.DCString.CurrentMPP更改为v.CurrentMPP

尝试将UseDatabaseNullSemantics设置为true。

这将消除所有is null检查,并应允许查询优化器为原始sql生成类似的执行计划。

using(var model = new YourContext())
{
   model.Configuration.UseDatabaseNullSemantics = true;
   var query = from d in model.StringDatas
                ...
}