我正在查询数据表,我似乎一直在选择一组人。
此代码
var grouping = table.AsEnumerable()
.Where(x => curveids.Contains(x.Field<short>("CurveID")) && x.Field<DateTime>("Timestamp").Hour >= hour && x.Field<DateTime>("Timestamp").Hour < (hour + 1))
.GroupBy(x => x.Field<DateTime>("Timestamp")).Where(x => x.Select(y => y["CurveID"]).Count() == curveids.Count);
按时间戳分组并返回一组x曲线,其中x = curveid.Count()。它包含5000个小组。
但是每天可以有多个时间戳。
int nrdays = grouping.GroupBy(z => z.Key.Date).Count();
告诉我有255天不同。
我现在想再次对此进行分组,但不是按时间戳,而是按日历日分组,然后每天进行第一次(最早)组。我试过这个:
var grouping2 = grouping.GroupBy(z => z.Key.Date).OrderBy(a => a.Key).Take(curveids.Count);
但这只返回4组,我不明白为什么? 它应返回255个组,每个组包含相同的时间戳和x个曲线,因此x * 255记录集。
数据表有3列,Timestamp(DateTime),CurveID(短),Price(double)。
更新
根据Skeet先生的要求,一个完整的例子:
public class listprx
{
public DateTime timestamp;
public int curveID;
public double prx;
}
static void Main(string[] args)
{
var data = new List<listprx>();
// populating data
for (int i = 0; i < 50000; i++)
{
Random rand = new Random(i);
var tempdt = new DateTime(2016, rand.Next(1, 12), rand.Next(1, 29), rand.Next(1, 23), rand.Next(1, 59), 0);
if(i % 3 == 0)
{
data.Add(new listprx { timestamp = tempdt, curveID = 1, prx = rand.Next(1,50)});
data.Add(new listprx { timestamp = tempdt, curveID = 2, prx = rand.Next(1, 50) });
}
else if (i % 5 == 0)
{
data.Add(new listprx { timestamp = tempdt, curveID = 1, prx = rand.Next(1, 50) });
}
else
{
data.Add(new listprx { timestamp = tempdt, curveID = 1, prx = rand.Next(1, 50) });
data.Add(new listprx { timestamp = tempdt, curveID = 2, prx = rand.Next(1, 50) });
data.Add(new listprx { timestamp = tempdt, curveID = 3, prx = rand.Next(1, 50) });
}
}
// setting hour criteria
int hour = 16;
int nrcurves = 3;
// grouping by timestamp and only take those where all curves are there, (as close to the desired time as possible
var grouping = data.Where(x => x.timestamp.Hour >= hour && x.timestamp.Hour < (hour + 1))
.GroupBy(x => x.timestamp).Where(x => x.Select(y => y.curveID).Count() == nrcurves);
// Grouping by day and take only the time stamp that is closest to the hour
// this fails
var grouping2 = grouping.GroupBy(z => z.Key.Date).OrderBy(a => a.Key).Take(nrcurves);
Console.WriteLine("Nr of timestamps with all curves {0}, nr of days {1}, nr of groups in second group {2}, expected same as nr days"
, grouping.Count(), grouping.GroupBy(z => z.Key.Date).Count(), grouping2.Count());
Console.ReadLine();
}
更新2
我删除了随机元素并进一步简化:
public class listprx
{
public DateTime timestamp;
public int curveID;
}
static void Main(string[] args)
{
var data = new List<listprx>();
// populating data
var tempdt = new DateTime(2016, 4, 6, 16, 1, 0);
for (int i = 0; i < 4; i++)
{
if (i == 2)
{
tempdt = tempdt.AddDays(1);
}
if(i % 2 == 0 )
{
data.Add(new listprx { timestamp = tempdt, curveID = 1});
}
else
{
data.Add(new listprx { timestamp = tempdt, curveID = 1});
data.Add(new listprx { timestamp = tempdt, curveID = 2});
}
tempdt = tempdt.AddMinutes(i+1);
}
// setting hour criteria
int hour = 16;
int nrcurves = 2;
//grouping by timestamp and only take those where all curves are there, (as close to the desired time as possible
var grouping = data.Where(x => x.timestamp.Hour >= hour && x.timestamp.Hour < (hour + 1))
.GroupBy(x => x.timestamp).Where(x => x.Select(y => y.curveID).Count() == nrcurves);
//Grouping by day and take only the time stamp that is closest to the hour
//this fails
var grouping2 = grouping.GroupBy(z => z.Key.Date).OrderBy(a => a.Key).Take(nrcurves);
Console.WriteLine("Nr of timestamps with all curves {0}, nr of days {1}, nr of groups in second group {2}, expected same as nr days"
, grouping.Count(), grouping.GroupBy(z => z.Key.Date).Count(), grouping2.Count());
Console.ReadLine();
}
预期的最终结果是:
Timestamp CurveID
------------------------
6/4/16 16:02 1
6/4/16 16:02 2
7/4/16 16:06 1
7/4/16 16:06 2
答案 0 :(得分:1)
编辑回答你的例子。
好的,我找到了你的例子并修复了一些错误和答案。让我们清楚一点代码并评论出错的地方。
我们的模型将是
public class Curve
{
public int CurveID { get; set; }
public DateTime Timestamp { get; set; }
}
public class CurveGroup
{
public DateTime Timestamp { get; set; }
public IEnumerable<Curve> Curves { get; set; }
}
接下来是生成测试数据的函数:
public static List<Curve> GetData()
{
var data = new List<Curve>();
var startTime = new DateTime(2016, 4, 6, 16, 1, 0);
for (int i = 0; i < 4; i++)
{
if (i == 2)
{
//startTime.AddDays(1); - this line does nothing, DateTime is an immutable struct so all function changing its value returns a new copy
startTime = startTime.AddDays(1);
}
if (i % 2 == 0)
{
data.Add(CreateNewCurve(startTime, 1));
}
else
{
data.Add(CreateNewCurve(startTime, 1));
data.Add(CreateNewCurve(startTime, 2));
}
//startTime.AddMinutes(i + 1); same issue as above
startTime = startTime.AddMinutes(i + 1);
}
return data;
}
public static Curve CreateNewCurve(DateTime time, int curveID)
{
return new Curve()
{
Timestamp = time,
CurveID = curveID
};
}
这里是主要功能
static void Main(string[] args)
{
var data = GetData();
int hour = 16;
int totalCurveCount = 2;
var grouping = data
.Where(x => x.Timestamp.Hour >= hour && x.Timestamp.Hour < (hour + 1))
.GroupBy(x => x.Timestamp)
.Where(x => x.Count() == totalCurveCount); //there is no need to select curveId like in your code: Where(x => x.Select(y => y.curveID).Count() == nrcurves);
var grouping2 = grouping
.GroupBy(x => x.Key.Date)
.Select(x =>
new CurveGroup
{
Timestamp = x.Key,
Curves = x.OrderBy(c => c.Key).Take(totalCurveCount).SelectMany(c => c)
}
);
foreach (var g in grouping2)
{
foreach (var c in g.Curves)
{
Console.WriteLine(c.Timestamp);
Console.WriteLine(c.CurveID);
}
}
}
这会返回预期结果。
您的代码失败了,因为您的第二个分组未在组中获取(Take(nrcurves)
)值,而是将组自身分组。因此,不是返回255组,每组包含2个值,而是返回2个组,其中包含所有值。
希望这可以解决您的问题。