LINQ - 过滤,分组和获取最小值和最大值

时间:2017-10-09 10:03:43

标签: c# entity-framework linq

假设我有一个EF实体类,它代表了一些时间价值:

public class Point
{
    public DateTime DT {get; set;}
    public decimal Value {get; set;}
}

我还有一个代表某段时间的课程:

public class Period
{
    public DateTime Begin {get; set;}
    public DateTime End {get; set;}
}

然后我有一个Period的数组,可以包含一些特定的时间片,让我们说它看起来像(Period个对象在数组中总是按升序排列):

var periodSlices = new Period [] 
{
    new Period { Begin = new DateTime(2016, 10, 1), End = new DateTime(2016, 10, 15)},
    new Period { Begin = new DateTime(2016, 10, 16), End = new DateTime(2016, 10, 20)},
    new Period { Begin = new DateTime(2016, 10, 21), End = new DateTime(2016, 12, 30)}
};

现在,使用LINQ to SQL如何编写一个查询,该查询会过滤掉Point中每个periodSlices中最早(最小)和最新(最大)值的查询,所以在在这个示例场景中,结果应该有3组具有最小和最大点(如果有的话)。

因此,我需要的是IQueryable<Period, IEnumerable<Point>>

现在我这样做,但表现并不是最好的:

using (var context = new EfDbContext())
{
    var periodBegin = periodSlices[0].Begin;
    var periodEnd = periodSlices[periodSlices.Length - 1].End;

     var dbPoints = context.Points.Where(p => p.DT >= periodBegin && p.DT <= periodEnd).ToArray();

    foreach (var slice in periodSlices)
    {
        var points = dbPoints.Where(p => p.DT >= slice.Begin && p.DT <= slice.End);

        if (points.Any())
        {
            var latestValue = points.MaxBy(u => u.DT).Value;
            var earliestValue = points.MinBy(u => u.DT).Value;
        }
    }   
}

性能至关重要(越快越好,因为我需要过滤掉并分组~100k点)。

2 个答案:

答案 0 :(得分:4)

这是一个单一的SQL查询解决方案:

var baseQueries = periodSlices
    .Select(slice => db.Points
        .Select(p => new { Period = new Period { Begin = slice.Begin, End = slice.End }, p.DT })
        .Where(p => p.DT >= p.Period.Begin && p.DT <= p.Period.End)
    );

var unionQuery = baseQueries
    .Aggregate(Queryable.Concat);

var periodQuery = unionQuery
    .GroupBy(p => p.Period)
    .Select(g => new
    {
        Period = g.Key,
        MinDT = g.Min(p => p.DT),
        MaxDT = g.Max(p => p.DT),
    });

var finalQuery =
    from p in periodQuery
    join pMin in db.Points on p.MinDT equals pMin.DT
    join pMax in db.Points on p.MaxDT equals pMax.DT
    select new
    {
        Period = p.Period,
        EarliestPoint = pMin,
        LatestPoint = pMax,
    };

为了便于阅读,我将LINQ查询部分分成了单独的变量。要获得结果,只应执行最终查询:

var result = finalQuery.ToList();

基本上,我们为每个切片构建一个UNION ALL查询,然后确定每个时间段的最小和最大日期,最后得到这些日期的相应值。我在分组中使用了join而不是“典型”OrderBy(Descending) + FirstOrDefault(),因为后者会生成可怕的SQL。

现在,主要问题。我不能说这是否比原始方法更快 - 这取决于DT列是否已编入索引以及periodSlices的计数,因为每个切片都会从源表中添加另一个UNION ALL SELECT在查询中,3个切片看起来像这样

SELECT
    [GroupBy1].[K1] AS [C1],
    [GroupBy1].[K2] AS [C2],
    [GroupBy1].[K3] AS [C3],
    [Extent4].[DT] AS [DT],
    [Extent4].[Value] AS [Value],
    [Extent5].[DT] AS [DT1],
    [Extent5].[Value] AS [Value1]
    FROM    (SELECT
        [UnionAll2].[C1] AS [K1],
        [UnionAll2].[C2] AS [K2],
        [UnionAll2].[C3] AS [K3],
        MIN([UnionAll2].[DT]) AS [A1],
        MAX([UnionAll2].[DT]) AS [A2]
        FROM  (SELECT
            1 AS [C1],
            @p__linq__0 AS [C2],
            @p__linq__1 AS [C3],
            [Extent1].[DT] AS [DT]
            FROM [dbo].[Point] AS [Extent1]
            WHERE ([Extent1].[DT] >= @p__linq__0) AND ([Extent1].[DT] <= @p__linq__1)
        UNION ALL
            SELECT
            1 AS [C1],
            @p__linq__2 AS [C2],
            @p__linq__3 AS [C3],
            [Extent2].[DT] AS [DT]
            FROM [dbo].[Point] AS [Extent2]
            WHERE ([Extent2].[DT] >= @p__linq__2) AND ([Extent2].[DT] <= @p__linq__3)
        UNION ALL
            SELECT
            1 AS [C1],
            @p__linq__4 AS [C2],
            @p__linq__5 AS [C3],
            [Extent3].[DT] AS [DT]
            FROM [dbo].[Point] AS [Extent3]
            WHERE ([Extent3].[DT] >= @p__linq__4) AND ([Extent3].[DT] <= @p__linq__5)) AS [UnionAll2]
        GROUP BY [UnionAll2].[C1], [UnionAll2].[C2], [UnionAll2].[C3] ) AS [GroupBy1]
    INNER JOIN [dbo].[Point] AS [Extent4] ON [GroupBy1].[A1] = [Extent4].[DT]
    INNER JOIN [dbo].[Point] AS [Extent5] ON [GroupBy1].[A2] = [Extent5].[DT]

答案 1 :(得分:2)

如果你想获得每个时间片中最早的(最小)和最新的(最大)点,我首先要看的是让数据库做更多的事情。

当你调用.ToArray()时,它会将所有选定的点都带入内存。这是没有意义的,因为你只想要每片2片。所以,如果你做了一些像:

foreach (var slice in periodSlices)
{
    var q = context
                .Points
                .Where(p => p.DT >= slice.Begin && p.DT <= slice.End)
                .OrderBy(x => x.DT);
    var min = q.FirstOrDefault();
    var max = q.LastOrDefault();
}

可能效果更好

我说可能因为它取决于数据库上有哪些索引以及每个切片中有多少个点。最终要获得非常好的性能,您可能需要在日期时间添加索引,或者更改结构以便预先存储最小值和最大值,或者在存储过程中执行此操作。