返回一组组中的前x个项目

时间:2016-04-07 11:16:45

标签: c# linq

我正在查询数据表,我似乎一直在选择一组人。

此代码

var grouping = table.AsEnumerable()
                .Where(x => curveids.Contains(x.Field<short>("CurveID")) && x.Field<DateTime>("Timestamp").Hour >= hour && x.Field<DateTime>("Timestamp").Hour < (hour + 1))
                .GroupBy(x => x.Field<DateTime>("Timestamp")).Where(x => x.Select(y => y["CurveID"]).Count() == curveids.Count);

按时间戳分组并返回一组x曲线,其中x = curveid.Count()。它包含5000个小组。

但是每天可以有多个时间戳。

int nrdays = grouping.GroupBy(z => z.Key.Date).Count();

告诉我有255天不同。

我现在想再次对此进行分组,但不是按时间戳,而是按日历日分组,然后每天进行第一次(最早)组。我试过这个:

var grouping2 = grouping.GroupBy(z => z.Key.Date).OrderBy(a => a.Key).Take(curveids.Count);

但这只返回4组,我不明白为什么? 它应返回255个组,每个组包含相同的时间戳和x个曲线,因此x * 255记录集。

数据表有3列,Timestamp(DateTime),CurveID(短),Price(double)。

更新

根据Skeet先生的要求,一个完整的例子:

    public class listprx
    {
        public DateTime timestamp;
        public int curveID;
        public double prx;
    }

    static void Main(string[] args)
    {
        var data = new List<listprx>();

        // populating data
        for (int i = 0; i < 50000; i++)
        {
            Random rand = new Random(i);
            var tempdt = new DateTime(2016, rand.Next(1, 12), rand.Next(1, 29), rand.Next(1, 23), rand.Next(1, 59), 0);

            if(i % 3 == 0)
            {
                data.Add(new listprx { timestamp = tempdt, curveID = 1, prx = rand.Next(1,50)});
                data.Add(new listprx { timestamp = tempdt, curveID = 2, prx = rand.Next(1, 50) });
            }
            else if (i % 5 == 0)
            {
                data.Add(new listprx { timestamp = tempdt, curveID = 1, prx = rand.Next(1, 50) });
            }
            else
            {
                data.Add(new listprx { timestamp = tempdt, curveID = 1, prx = rand.Next(1, 50) });
                data.Add(new listprx { timestamp = tempdt, curveID = 2, prx = rand.Next(1, 50) });
                data.Add(new listprx { timestamp = tempdt, curveID = 3, prx = rand.Next(1, 50) });
            }
        }

        // setting hour criteria
        int hour = 16;
        int nrcurves = 3;

        // grouping by timestamp and only take those where all curves are there, (as close to the desired time as possible
        var grouping = data.Where(x => x.timestamp.Hour >= hour && x.timestamp.Hour < (hour + 1))
            .GroupBy(x => x.timestamp).Where(x => x.Select(y => y.curveID).Count() == nrcurves);

        // Grouping by day and take only the time stamp that is closest to the hour
        // this fails
        var grouping2 = grouping.GroupBy(z => z.Key.Date).OrderBy(a => a.Key).Take(nrcurves);

        Console.WriteLine("Nr of timestamps with all curves {0}, nr of days {1}, nr of groups in second group {2}, expected same as nr days"
            , grouping.Count(), grouping.GroupBy(z => z.Key.Date).Count(), grouping2.Count());

        Console.ReadLine();
}

更新2

我删除了随机元素并进一步简化:

public class listprx
{
        public DateTime timestamp;
        public int curveID;
}

static void Main(string[] args)
{
        var data = new List<listprx>();

        // populating data
        var tempdt = new DateTime(2016, 4, 6, 16, 1, 0);

        for (int i = 0; i < 4; i++)
        {
            if (i == 2)
            {
                tempdt = tempdt.AddDays(1);
            }

            if(i % 2 == 0 )
            {
                data.Add(new listprx { timestamp = tempdt, curveID = 1});
            }
            else
            {
                data.Add(new listprx { timestamp = tempdt, curveID = 1});
                data.Add(new listprx { timestamp = tempdt, curveID = 2});
            }

            tempdt = tempdt.AddMinutes(i+1);
        }

        // setting hour criteria
        int hour = 16;
        int nrcurves = 2;

        //grouping by timestamp and only take those where all curves are there, (as close to the desired time as possible
        var grouping = data.Where(x => x.timestamp.Hour >= hour && x.timestamp.Hour < (hour + 1))
            .GroupBy(x => x.timestamp).Where(x => x.Select(y => y.curveID).Count() == nrcurves);

        //Grouping by day and take only the time stamp that is closest to the hour
        //this fails
        var grouping2 = grouping.GroupBy(z => z.Key.Date).OrderBy(a => a.Key).Take(nrcurves);

        Console.WriteLine("Nr of timestamps with all curves {0}, nr of days {1}, nr of groups in second group {2}, expected same as nr days"
            , grouping.Count(), grouping.GroupBy(z => z.Key.Date).Count(), grouping2.Count());

    Console.ReadLine();
}

预期的最终结果是:

Timestamp        CurveID
------------------------
6/4/16 16:02        1
6/4/16 16:02        2
7/4/16 16:06        1
7/4/16 16:06        2

1 个答案:

答案 0 :(得分:1)

编辑回答你的例子。

好的,我找到了你的例子并修复了一些错误和答案。让我们清楚一点代码并评论出错的地方。

我们的模型将是

public class Curve
{
    public int CurveID { get; set; }
    public DateTime Timestamp { get; set; }
}

public class CurveGroup
{
    public DateTime Timestamp { get; set; }
    public IEnumerable<Curve> Curves { get; set; }
}

接下来是生成测试数据的函数:

public static List<Curve> GetData()
{
    var data = new List<Curve>();
    var startTime = new DateTime(2016, 4, 6, 16, 1, 0);

    for (int i = 0; i < 4; i++)
    {
        if (i == 2)
        {
           //startTime.AddDays(1); - this line does nothing, DateTime is an immutable struct so all function changing its value returns a new copy
           startTime = startTime.AddDays(1);
        }

        if (i % 2 == 0)
        {
           data.Add(CreateNewCurve(startTime, 1));
        }
        else
        {
           data.Add(CreateNewCurve(startTime, 1));
           data.Add(CreateNewCurve(startTime, 2));
        }

        //startTime.AddMinutes(i + 1); same issue as above
        startTime = startTime.AddMinutes(i + 1);
    }

    return data;
}

public static Curve CreateNewCurve(DateTime time, int curveID)
{
    return new Curve()
    {
        Timestamp = time,
        CurveID = curveID
    };
}

这里是主要功能

static void Main(string[] args)
{
    var data = GetData();

    int hour = 16;
    int totalCurveCount = 2;

    var grouping = data
           .Where(x => x.Timestamp.Hour >= hour && x.Timestamp.Hour < (hour + 1))
           .GroupBy(x => x.Timestamp)
           .Where(x => x.Count() == totalCurveCount); //there is no need to select curveId like in your code: Where(x => x.Select(y => y.curveID).Count() == nrcurves);

    var grouping2 = grouping
           .GroupBy(x => x.Key.Date)
           .Select(x =>
                new CurveGroup
                {
                   Timestamp = x.Key,
                   Curves = x.OrderBy(c => c.Key).Take(totalCurveCount).SelectMany(c => c)
                }
           );


    foreach (var g in grouping2)
    {
        foreach (var c in g.Curves)
        {
            Console.WriteLine(c.Timestamp);
            Console.WriteLine(c.CurveID);
        }
    }
}

这会返回预期结果。

您的代码失败了,因为您的第二个分组未在组中获取(Take(nrcurves))值,而是将组自身分组。因此,不是返回255组,每组包含2个值,而是返回2个组,其中包含所有值。

希望这可以解决您的问题。