有两个表,Events和Octave:
+---------+-------+
| EventId | Time |
+---------+-------+
+----------+---------+-----------+-------+
| OctaveId | EventId | Frequency | Value |
+----------+---------+-----------+-------+
平均每个事件有10个八度,每10秒记录一次事件,现在有大约400k事件和400万个八度音阶。 我想过滤特定时间范围内的事件,按小时汇总它们,并返回每个具有相同频率的八度值的平均值。 我正在使用的EF6 LINQ代码是:
_context.Events
.Where(x => x.Time >= afterDate)
.Where(x => x.Time <= beforeDate)
.Select(x => new { year = x.Time.Year, month = x.Time.Month, day = x.Time.Day, hour = x.Time.Hour, data = x.Data })
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Where(x => x.Any())
.Select(x => new
{
Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
Data = x.SelectMany(y => y.data).GroupBy(y => new { frequency = y.Frequency }).Select(y => new
{
frequency = y.Key.frequency,
value = Math.Round(y.Average(z => z.Value), 1),
})
})
.OrderByDescending(m => m.Time)
.Take(limit);
哪个有效,但只有在时间跨度非常小(几个小时)时才有效。如果它增加到几天,查询似乎永远运行。 我对SQL Server要求太多了吗?或者有更好的方法来运行此查询/结构我的数据? 如果我删除SelectMany(...)。GroupBy(...)那么它就不再那么疯狂了。
生成的SQL查询是:
SELECT
[Project5].[C1] AS [C1],
[Project5].[C2] AS [C2],
[Project5].[C3] AS [C3],
[Project5].[C4] AS [C4],
[Project5].[C5] AS [C5],
[Project5].[C6] AS [C6],
[Project5].[C8] AS [C7],
[Project5].[Frequency] AS [Frequency],
[Project5].[C7] AS [C8]
FROM ( SELECT
[Limit1].[C1] AS [C1],
[Limit1].[C2] AS [C2],
[Limit1].[C3] AS [C3],
[Limit1].[C4] AS [C4],
[Limit1].[C5] AS [C5],
[Limit1].[C6] AS [C6],
CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS float) ELSE ROUND([GroupBy1].[A1], 1) END AS [C7],
[GroupBy1].[K1] AS [Frequency],
CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C8]
FROM (SELECT TOP (10000) [Project4].[C1] AS [C1], [Project4].[C2] AS [C2], [Project4].[C3] AS [C3], [Project4].[C4] AS [C4], [Project4].[C5] AS [C5], [Project4].[C6] AS [C6]
FROM ( SELECT
[Project2].[C1] AS [C1],
[Project2].[C2] AS [C2],
[Project2].[C3] AS [C3],
[Project2].[C4] AS [C4],
1 AS [C5],
convert (datetime2,right('000' + convert(varchar(255), [Project2].[C1]), 4) + '-' + convert(varchar(255), [Project2].[C2]) + '-' + convert(varchar(255), [Project2].[C3]) + ' ' + convert(varchar(255), [Project2].[C4]) + ':' + convert(varchar(255), 0) + ':' + str(cast(0 as float(53)), 10, 7), 121) AS [C6]
FROM ( SELECT
[Distinct1].[C1] AS [C1],
[Distinct1].[C2] AS [C2],
[Distinct1].[C3] AS [C3],
[Distinct1].[C4] AS [C4]
FROM ( SELECT DISTINCT
DATEPART (year, [Extent1].[TimeEnd]) AS [C1],
DATEPART (month, [Extent1].[TimeEnd]) AS [C2],
DATEPART (day, [Extent1].[TimeEnd]) AS [C3],
DATEPART (hour, [Extent1].[TimeEnd]) AS [C4]
FROM [dbo].[Events] AS [Extent1]
WHERE ([Extent1].[TimeEnd] >= @p__linq__1) AND ([Extent1].[TimeEnd] <= @p__linq__2)
) AS [Distinct1]
) AS [Project2]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Events] AS [Extent2]
WHERE ([Extent2].[TimeEnd] >= @p__linq__1) AND ([Extent2].[TimeEnd] <= @p__linq__2) AND (([Project2].[C1] = (DATEPART (year, [Extent2].[TimeEnd]))) OR (([Project2].[C1] IS NULL) AND (DATEPART (year, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C2] = (DATEPART (month, [Extent2].[TimeEnd]))) OR (([Project2].[C2] IS NULL) AND (DATEPART (month, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C3] = (DATEPART (day, [Extent2].[TimeEnd]))) OR (([Project2].[C3] IS NULL) AND (DATEPART (day, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C4] = (DATEPART (hour, [Extent2].[TimeEnd]))) OR (([Project2].[C4] IS NULL) AND (DATEPART (hour, [Extent2].[TimeEnd]) IS NULL)))
)
) AS [Project4]
ORDER BY [Project4].[C6] DESC ) AS [Limit1]
OUTER APPLY (SELECT
[Extent4].[Frequency] AS [K1],
AVG([Extent4].[Value]) AS [A1]
FROM [dbo].[Events] AS [Extent3]
INNER JOIN [dbo].[Octaves] AS [Extent4] ON [Extent3].[EventId] = [Extent4].[EventId]
WHERE ([Extent3].[TimeEnd] >= @p__linq__1) AND ([Extent3].[TimeEnd] <= @p__linq__2) AND (([Limit1].[C1] = (DATEPART (year, [Extent3].[TimeEnd]))) OR (([Limit1].[C1] IS NULL) AND (DATEPART (year, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C2] = (DATEPART (month, [Extent3].[TimeEnd]))) OR (([Limit1].[C2] IS NULL) AND (DATEPART (month, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C3] = (DATEPART (day, [Extent3].[TimeEnd]))) OR (([Limit1].[C3] IS NULL) AND (DATEPART (day, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C4] = (DATEPART (hour, [Extent3].[TimeEnd]))) OR (([Limit1].[C4] IS NULL) AND (DATEPART (hour, [Extent3].[TimeEnd]) IS NULL)))
GROUP BY [Extent4].[Frequency] ) AS [GroupBy1]
) AS [Project5]
ORDER BY [Project5].[C6] DESC, [Project5].[C1] ASC, [Project5].[C2] ASC, [Project5].[C3] ASC, [Project5].[C4] ASC, [Project5].[C8] ASC
更新1
我试图'翻转'查询,直接查询八度音,我有更好的结果。我首先按日期和频率对它们进行分组,计算平均值,然后我再按时间对它们进行分组。它根本不优雅,但它是实际工作的第一个解决方案。如果分组以不同的方式完成(例如,首先按时间,然后按频率进行,然后进行平均),它仍然无效。
_context.Octaves
.Where(x => x.Event.Time >= afterDate)
.Where(x => x.Event.Time <= beforeDate)
.GroupBy(x => new { year = x.Event.Time.Year, month = x.Event.Time.Month, day = x.Event.Time.Day, hour = x.Event.Time.Hour, freq = x.Frequency })
.Select(x => new
{
year = x.Key.year,
month = x.Key.month,
day = x.Key.day,
hour = x.Key.hour,
freq = x.Key.freq,
value = Math.Round(x.Average(y => y.Value), 1)
})
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
{
timeEnd = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
data = x.Select(y=> new {freq = y.freq, value = y.value })
})
.OrderByDescending(m => m.timeEnd)
.Take(limit)
答案 0 :(得分:0)
我不确定,但您可能想尝试一下。可能会更糟,我不确定。
_context.Events.AsNoTracking()
.Where(x => x.Time >= afterDate && x.Time <= beforeDate)
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
{Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
Data = x.SelectMany
(y =>
y.Select(h =>
h.data.GroupBy(y => y.Frequency).select(y =>
new {
frequency = y.Key,
value = Math.Round(y.Average(z => z.Value), 1)
}
))))
.OrderByDescending(m => m.Time)
.Take(limit);