大数据集上的EF6聚合

时间:2017-05-23 15:43:08

标签: c# linq entity-framework-6

有两个表,Events和Octave:

+---------+-------+
| EventId | Time  |
+---------+-------+

+----------+---------+-----------+-------+
| OctaveId | EventId | Frequency | Value |
+----------+---------+-----------+-------+

平均每个事件有10个八度,每10秒记录一次事件,现在有大约400k事件和400万个八度音阶。 我想过滤特定时间范围内的事件,按小时汇总它们,并返回每个具有相同频率的八度值的平均值。 我正在使用的EF6 LINQ代码是:

_context.Events
      .Where(x => x.Time >= afterDate)
      .Where(x => x.Time <= beforeDate)
      .Select(x => new { year = x.Time.Year, month = x.Time.Month, day = x.Time.Day, hour = x.Time.Hour, data = x.Data })
      .GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
      .Where(x => x.Any())
      .Select(x => new
      {
         Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
         Data = x.SelectMany(y => y.data).GroupBy(y => new { frequency = y.Frequency }).Select(y => new
         {
            frequency  = y.Key.frequency,
            value = Math.Round(y.Average(z => z.Value), 1),
         })

      })
        .OrderByDescending(m => m.Time)
        .Take(limit);

哪个有效,但只有在时间跨度非常小(几个小时)时才有效。如果它增加到几天,查询似乎永远运行。 我对SQL Server要求太多了吗?或者有更好的方法来运行此查询/结构我的数据? 如果我删除SelectMany(...)。GroupBy(...)那么它就不再那么疯狂了。

生成的SQL查询是:

SELECT 
    [Project5].[C1] AS [C1], 
    [Project5].[C2] AS [C2], 
    [Project5].[C3] AS [C3], 
    [Project5].[C4] AS [C4], 
    [Project5].[C5] AS [C5], 
    [Project5].[C6] AS [C6], 
    [Project5].[C8] AS [C7], 
    [Project5].[Frequency] AS [Frequency], 
    [Project5].[C7] AS [C8]
    FROM ( SELECT 
        [Limit1].[C1] AS [C1], 
        [Limit1].[C2] AS [C2], 
        [Limit1].[C3] AS [C3], 
        [Limit1].[C4] AS [C4], 
        [Limit1].[C5] AS [C5], 
        [Limit1].[C6] AS [C6], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS float) ELSE ROUND([GroupBy1].[A1], 1) END AS [C7], 
        [GroupBy1].[K1] AS [Frequency], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C8]
        FROM   (SELECT TOP (10000) [Project4].[C1] AS [C1], [Project4].[C2] AS [C2], [Project4].[C3] AS [C3], [Project4].[C4] AS [C4], [Project4].[C5] AS [C5], [Project4].[C6] AS [C6]
            FROM ( SELECT 
                [Project2].[C1] AS [C1], 
                [Project2].[C2] AS [C2], 
                [Project2].[C3] AS [C3], 
                [Project2].[C4] AS [C4], 
                1 AS [C5], 
                convert (datetime2,right('000' + convert(varchar(255), [Project2].[C1]), 4) + '-' + convert(varchar(255), [Project2].[C2]) + '-' + convert(varchar(255), [Project2].[C3]) + ' ' + convert(varchar(255), [Project2].[C4]) + ':' + convert(varchar(255), 0) + ':' + str(cast(0 as float(53)), 10, 7), 121) AS [C6]
                FROM ( SELECT 
                    [Distinct1].[C1] AS [C1], 
                    [Distinct1].[C2] AS [C2], 
                    [Distinct1].[C3] AS [C3], 
                    [Distinct1].[C4] AS [C4]
                    FROM ( SELECT DISTINCT 
                        DATEPART (year, [Extent1].[TimeEnd]) AS [C1], 
                        DATEPART (month, [Extent1].[TimeEnd]) AS [C2], 
                        DATEPART (day, [Extent1].[TimeEnd]) AS [C3], 
                        DATEPART (hour, [Extent1].[TimeEnd]) AS [C4]
                        FROM [dbo].[Events] AS [Extent1]
                        WHERE ([Extent1].[TimeEnd] >= @p__linq__1) AND ([Extent1].[TimeEnd] <= @p__linq__2)
                    )  AS [Distinct1]
                )  AS [Project2]
                WHERE  EXISTS (SELECT 
                    1 AS [C1]
                    FROM [dbo].[Events] AS [Extent2]
                    WHERE ([Extent2].[TimeEnd] >= @p__linq__1) AND ([Extent2].[TimeEnd] <= @p__linq__2) AND (([Project2].[C1] = (DATEPART (year, [Extent2].[TimeEnd]))) OR (([Project2].[C1] IS NULL) AND (DATEPART (year, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C2] = (DATEPART (month, [Extent2].[TimeEnd]))) OR (([Project2].[C2] IS NULL) AND (DATEPART (month, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C3] = (DATEPART (day, [Extent2].[TimeEnd]))) OR (([Project2].[C3] IS NULL) AND (DATEPART (day, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C4] = (DATEPART (hour, [Extent2].[TimeEnd]))) OR (([Project2].[C4] IS NULL) AND (DATEPART (hour, [Extent2].[TimeEnd]) IS NULL)))
                )
            )  AS [Project4]
            ORDER BY [Project4].[C6] DESC ) AS [Limit1]
        OUTER APPLY  (SELECT 
            [Extent4].[Frequency] AS [K1], 
            AVG([Extent4].[Value]) AS [A1]
            FROM  [dbo].[Events] AS [Extent3]
            INNER JOIN [dbo].[Octaves] AS [Extent4] ON [Extent3].[EventId] = [Extent4].[EventId]
            WHERE ([Extent3].[TimeEnd] >= @p__linq__1) AND ([Extent3].[TimeEnd] <= @p__linq__2) AND (([Limit1].[C1] = (DATEPART (year, [Extent3].[TimeEnd]))) OR (([Limit1].[C1] IS NULL) AND (DATEPART (year, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C2] = (DATEPART (month, [Extent3].[TimeEnd]))) OR (([Limit1].[C2] IS NULL) AND (DATEPART (month, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C3] = (DATEPART (day, [Extent3].[TimeEnd]))) OR (([Limit1].[C3] IS NULL) AND (DATEPART (day, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C4] = (DATEPART (hour, [Extent3].[TimeEnd]))) OR (([Limit1].[C4] IS NULL) AND (DATEPART (hour, [Extent3].[TimeEnd]) IS NULL)))
            GROUP BY [Extent4].[Frequency] ) AS [GroupBy1]
    )  AS [Project5]
    ORDER BY [Project5].[C6] DESC, [Project5].[C1] ASC, [Project5].[C2] ASC, [Project5].[C3] ASC, [Project5].[C4] ASC, [Project5].[C8] ASC

更新1

我试图'翻转'查询,直接查询八度音,我有更好的结果。我首先按日期和频率对它们进行分组,计算平均值,然后我再按时间对它们进行分组。它根本不优雅,但它是实际工作的第一个解决方案。如果分组以不同的方式完成(例如,首先按时间,然后按频率进行,然后进行平均),它仍然无效。

 _context.Octaves
.Where(x => x.Event.Time >= afterDate)
.Where(x => x.Event.Time <= beforeDate)
.GroupBy(x => new { year = x.Event.Time.Year, month = x.Event.Time.Month, day = x.Event.Time.Day, hour = x.Event.Time.Hour, freq = x.Frequency })
.Select(x => new
{
  year = x.Key.year,
  month = x.Key.month,
  day = x.Key.day,
  hour = x.Key.hour,
  freq = x.Key.freq,
  value = Math.Round(x.Average(y => y.Value), 1)

})
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
{
  timeEnd = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
  data = x.Select(y=> new {freq = y.freq, value = y.value })

})
.OrderByDescending(m => m.timeEnd)
.Take(limit)

1 个答案:

答案 0 :(得分:0)

我不确定,但您可能想尝试一下。可能会更糟,我不确定。

_context.Events.AsNoTracking()
  .Where(x => x.Time >= afterDate &&  x.Time <= beforeDate)
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
               {Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
                   Data = x.SelectMany
                   (y => 
                        y.Select(h => 
                        h.data.GroupBy(y => y.Frequency).select(y => 
                                new {
                                        frequency = y.Key,
                                        value = Math.Round(y.Average(z => z.Value), 1)
                                    }
 ))))
    .OrderByDescending(m => m.Time)
    .Take(limit);