我有一个数据表,我希望每天选择第一个条目,其中所有curveIDs
都存在。我能想到的唯一方法是使用连接,因为它只会在两个数据集都存在的情况下自动连接。
这是我到目前为止所做的:
//core data from sql (I have little control over this)
DataTable ds = new DataTable();
da.Fill(ds);
//creating dataset with various tables based on curveIDs I look for
System.Data.DataSet dataSet = new System.Data.DataSet();
for (int i = 0; i < curveIds.Length; i++)
{
dataSet.Tables.Add(ds.AsEnumerable().Where(x => x.Field<short>("curveID") == curveIds[i]).CopyToDataTable());
}
//lets say I have two only and then I join them like this to match timestamps correctly
var result = from table1 in dataSet.Tables[0].AsEnumerable()
join table2 in dataSet.Tables[1].AsEnumerable()
on table1["Timestamp"] equals table2["Timestamp"]
select new
{
Timestamp = (DateTime)table1["Timestamp"],
Spread = (double)table1["mid"] - 0.4 * (double)table2["mid"],
Power = (double)table1["mid"]
};
//lastly I do a firstordefault over the data as I only want the first timestamp where both are present (this step doesnt return the correct data)
var endres = result.OrderBy(a => a.Timestamp).GroupBy(a => a.Timestamp.ToShortDateString()).FirstOrDefault().ToList();
这看起来很复杂。最后一步也不会在清晨每天返回一个记录集,而是在一天内返回许多数据集。
在完整的问题中,我必须为4-6 curveIDs
执行此操作,这意味着我必须执行可变数量的连接,这使得此方法不可行。
源数据在工作日的上午8点到下午4点之间的每分钟都有列(Timestamp
,CurveID
,Mid
,但不能保证所有curveIDs
实际上都是每个时间戳都有。
让我们在第1天8:01说所有的ID都在那里(第一次是真实的但不仅仅是这样)而且在第二天只有8:03都有ID,那么返回数据应该是:
Day1 8:01, spread =x, Power=y Day2 8:03, spread =z, Power=a ...
......依此类推,每天只有一个条目被选为第一个所有ID都存在的条目。
答案 0 :(得分:1)
如果我理解得很好,你想要找到每天的最低时间戳(在你拥有的数据表中),它包含所有&#34; curveIDs&#34;你的curveID列表?
如果是这样,那么我写了一个可能解决它的代码。如果有错误,请在评论中告诉我。使用列表比设置数据表更容易理解。所以我只是用你了#34; ds&#34;数据表并构建了一个indenpendt代码。
还有优化,但这会让代码更难理解。
DataTable ds = new DataTable();
List<int> curveIds = new List<int>() {1,2,3,4};
public void Test()
{
LoadDs();
List<object> endress = new List<object>();
//filter all timestamps, getting only the date info
var timeStamps = ds.AsEnumerable().Select(r=> ((DateTime)r["Timestamp"]).Date).Distinct();
//for each id
foreach (var timeStamp in timeStamps)
{
//find all the same timestamp (on the same day)
var listSameTimestamp = ds.AsEnumerable().Where(r => ((DateTime)r["Timestamp"]).Date == timeStamp);
var listIds = listSameTimestamp.Select(r => (int)r["curveID"]).Distinct();
//ensure they all have the curveIDs you are looking for
var haveThemAll = curveIds.Intersect(listIds).Count() == curveIds.Count();
if (haveThemAll == false)
continue;
//find the lowest timestamp
var rowFound = listSameTimestamp.OrderBy(r => (DateTime)r["Timestamp"]).FirstOrDefault();
if (rowFound == null)
continue;
//create an anonymous object (coud not understand your needs)
endress.Add(new
{
Timestamp = (DateTime)rowFound["Timestamp"],
Spread = (double)rowFound["mid"] - 0.4 * (double)rowFound["mid"],
Power = (double)rowFound["mid"]
});
}
foreach (var o in endress)
{
Console.WriteLine(o);
}
}
public void LoadDs()
{
ds = new DataTable();
ds.Columns.Add("curveID",typeof(int));
ds.Columns.Add("Timestamp", typeof(DateTime));
ds.Columns.Add("mid", typeof(double));
for (int i = 0; i < 50000; i++)
{
Random rand = new Random(i);
var row = ds.NewRow();
row["curveID"] = rand.Next(1,5);
row["Timestamp"] = new DateTime(2016,4, rand.Next(1,5), rand.Next(1,3), 0,0);
row["mid"] = rand.NextDouble();
ds.Rows.Add(row);
}
}
这是&#34;主要&#34;分段。但是你可以在这里看到完整的测试代码:
heroku run rake db:setup
答案 1 :(得分:1)
如果我理解正确:
你有一张带有时间戳,曲线,中间列的表格
2.时间戳(至少通常是)每分钟,并非所有曲线都保证存在
3.您希望使用存在所有必需曲线的第一个时间戳的行来计算点差,功率
我建议这样的事情:
// I'll pretend the curveids are in this list...
List<double> curveids = new List<double>();
DataTable table = ds.Tables["Your table"];
// first get a grouping of timestamps for the day containing all curveids
// setup mindate and maxdate of your choosing...
var grouping = table.AsEnumerable()
.Where(x => curveids.Contains(x.curveid) &&
x.timestamp > mindate &&
x.timestamp < maxdate)
.GroupBy(x => x.timestamp);
// this gives a grouping of IEnumerable<IGrouping<DateTime, YourRowType>>
// i.e. timestamps, and group of rows for each with curveids in your selection
// Now get the minimum timestamp, where all curve ids are present..
DateTime minTimestamp = grouping.Where(x => x.Count(y => y.curveid) == curveids.Count)
.Select(x => x.Key).Min();
// .. now can do what you wish with that...
// For example:
var resultRows = table.AsEnumerable().Where(x =>
x.timestamp == minTimestamp &&
curveids.Contains(x.Close));
现在你可以使用resultRows并根据公式
计算点差,功率等等答案 2 :(得分:0)
以下是我的看法:
//selecting into an object for better readability and access
var result = dt.AsEnumerable().Select(r => new
{
TimeStamp = r.Field<DateTime>("TimeStamp"),
CurveID = r.Field<short>("CurveId"),
Mid = r.Field<double>("Mid")
})
// ignoring rows with different curve ID than in the list
.Where(item => ids.Contains(item.CurveID))
// grouping by timestamp
.GroupBy(item => item.TimeStamp)
// selecting only groups that have all curve Ids
.Where(g => g.Select(i=>i.CurveID).Distinct().Count() == ids.Count)
// grouping the groups by date
.GroupBy(g => g.Key.Date)
.Select(g2 =>
{
// getting the first timestamp group by timestamp
var min = g2.OrderBy(i => i.Key).First();
// getting all the Mid values
var values = min.Select(i => i.Mid).ToList();
// returning the desired computation
return new
{
TimeStamp = min.Key,
Spread = spread(values),
Power = power(values)
};
})
.ToList();
我对问题文本和现有评论的假设是:
我必须补充一点,这不是最有效的方法,因为有几个遍历数据:首先按时间戳分组并按ID过滤,然后按curveids过滤,然后按日期分组终于到了第一个当天的第一个时间戳。一个更快但不太可读的实现将首先排序,然后只传递一次通过每个项目。