我在IEnumerable中有时间序列数据,非偶数采样(即我可能在前10秒内有5个样本,然后在接下来的10秒内有10个样本等)
我想在30秒的滚动窗口中创建滚动均值,最大值和最小值
我相信跳过每次从头开始列举。
是否可以跳过结果并再次使用它而不调用它?
c#中是否可以复制迭代器,因为我希望有一个beginwindow和endwindow迭代器,然后我在其间进行枚举,这意味着我不必每次都从头开始迭代,
我的代码目前正常工作,看起来就是这个
class Data
{
float Value;
DateTime Time;
};
BufferData = new IEnumerable<...>
int index = 0;
TimeSpan windowWidth = new TimeSpan(0,0,30);
DateTime currentStart;
while (index < BufferData.Count)
{
currentStart = BufferData.Skip(index).First().Time;
var window = BufferData.Skip(index).TakeWhile(x => x.Time<= currentStart + windowWidth);
DateTime centre = currentStart + new TimeSpan((window.Last().Time- currentStart).Ticks / 2);
float min = window.Min(x => x.Value);
float max = window.Max(x => x.Value);
++index;
}
答案 0 :(得分:1)
如果您乐意使用Microsoft的Reactive Framework团队的“Interactive Extensions”(NuGet“Ix-Main”),那么这是一个相当直接的解决方案:
var windows =
BufferData
.Scan(new List<Data>(), (accumulator, item) =>
accumulator
.Where(x => x.Time.AddSeconds(30.0) >= item.Time)
.Concat(new[] { item })
.ToList())
.Select(xs => new
{
Centre = xs.First().Time.AddSeconds(
xs.Last().Time.Subtract(xs.First().Time).TotalSeconds / 2.0),
Max = xs.Max(x => x.Value),
Mix = xs.Min(x => x.Value),
});
扫描运算符使用和累加器一样使用标准.Aggregate
运算符,但为每个输入生成一个值。
这应该与您当前的代码结果相同。
它也只迭代原始源一次(尽管在30秒窗口内有多次迭代)。
答案 1 :(得分:0)
编辑 - 正如@PeterDuniho指出的那样,这不是真正的“滚动平均值”。每次添加新项目时都不会重新计算。它只是每30秒(或WindowWidth
设置的任何内容)提供统计信息的快照。我现在就留下这个答案,以防它可能有用,但它并不是真正的要求。
我认为以下解决方案应该对大型数据集执行得非常快(它应该是O(n))。作为一个概念证明,我在一百万个项目列表上运行了这个测试,它在LINQPad 4中完成了0.782秒(在一台绝对不是最先进的笔记本电脑上)。
public IEnumerable<Stats> CalculateStats(
List<Data> bufferData,
DateTime startTime,
TimeSpan windowWidth)
{
var finishTime = bufferData.Last().Time;
return bufferData
.Select(x => new
{
x.Value,
WindowIndex = GetWindowIndex(x.Time, startTime, windowWidth)
})
.GroupBy(
x => x.WindowIndex,
(i, items) => new Stats
{
StartTime = GetWindowTime(startTime, windowWidth, i),
FinishTime = GetWindowTime(startTime, windowWidth, i + 1),
Mean = (float)items.Average(x => x.Value),
Max = (float)items.Max(x => x.Value),
Min = (float)items.Min(x => x.Value)
});
}
private int GetWindowIndex(DateTime time, DateTime startTime, TimeSpan windowWidth)
{
var timeSinceStart = time - startTime;
var secondsSinceStart = timeSinceStart.TotalSeconds;
return (int)Math.Ceiling(secondsSinceStart / windowWidth.TotalSeconds);
}
private DateTime GetWindowTime(DateTime startTime, TimeSpan windowWidth, int windowIndex)
{
return startTime + TimeSpan.FromSeconds(windowWidth.TotalSeconds * windowIndex);
}
public class Stats
{
public DateTime StartTime { get; set; }
public DateTime FinishTime { get; set; }
public float Mean { get; set; }
public float Max { get; set; }
public float Min { get; set; }
}
public class Data
{
public float Value { get; set; }
public DateTime Time { get; set; }
}
答案 2 :(得分:0)
我无法找到完全线性地完成此操作的方法,但至少在窗口内只有ocurrs,而不是整个数据集。此方法在30秒内提供所有滚动(重叠)的数据样本窗口。
您可以将其作为扩展方法,也可以只使用常规方法。为了简化使用,我使用了扩展方法。
static IEnumerable<IEnumerable<T>> Windows<T>(this IEnumerable<T> self, Func<T, DateTime> selector, TimeSpan span) {
var enumerator = self.GetEnumerator();
var samples = new List<T>();
var start = DateTime.MinValue;
while (enumerator.MoveNext()) {
var end = selector(enumerator.Current);
if (end > start + span) {
start = end - span;
}
samples = samples.SkipWhile(i => selector(i) < start).ToList();
samples.Add(enumerator.Current);
yield return samples;
}
}
用法示例:滚动平均值 1 超过30秒。
var rollingAverages = BufferData
.Windows(d => d.Time, new TimeSpan(0, 0, 30))
.Select(win => win.Average());
这可以通过将当前元素视为窗口的末尾来实现,因此窗口的开始小于给定的跨度,但随着时间的推移会增长到最大值。
1 :我的英语统计术语有点生疏,也许这是一个滚动的意思?
答案 3 :(得分:0)
使用队列可以最大限度地减少内存分配的数量:
static IEnumerable<TimeSample> TimeRollingWindow (IEnumerable<Data> data, TimeSpan interval)
{
Queue<Data> buffer = new Queue<Data>();
foreach(var item in data)
{
buffer.Enqueue(item);
// Remove the old data
while (buffer.Count > 0 && (item.Time - buffer.Peek().Time > interval))
{
buffer.Dequeue();
}
float max = float.MinValue;
float min = float.MaxValue;
double sum = 0;
foreach(var h in buffer)
{
sum += h.Value;
max = Math.Max(max, h.Value);
min = Math.Min(min, h.Value);
}
// spit it out
yield return new TimeSample(buffer.Peek().Time, item.Time, min, max, (float)(sum / buffer.Count));
}
}
class TimeSample
{
public TimeSample(DateTime startTime, DateTime endTime, float min, float max, float mean)
{
StartTime = startTime;
EndTime = endTime;
Min = min;
Max = max;
Mean = mean;
}
public readonly DateTime StartTime;
public readonly DateTime EndTime;
public readonly float Min;
public readonly float Max;
public readonly float Mean;
}
class Data
{
public Data(DateTime time, float value)
{
Time = time;
Value = value;
}
public readonly DateTime Time;
public readonly float Value;
}
答案 4 :(得分:0)
查看MoreLINQ上的NuGet库。通过撰写compute running totals,您可以accumulator functions和平均值。
任何“滚动窗口”类聚合的技巧只是编写一个累加器函数,只要它们保持在所需的窗口范围内,它就会保留队列缓冲区中序列的值。当序列元素不再符合范围标准时,它们会从缓冲区中出列,并且它们的值将从任何聚合或总计中删除(去累积)。
在我进入任何代码之前,我需要发布一个免责声明,即下面的所有内容都直接输入到回复窗口,这意味着它甚至可能无法编译。一般的概念是合理的,但这与我可以保证的一样多。
开始使用Data
类和BufferData
种子以及MoreLINQ中的.Scan()
函数:
//First we need a type to hold the results:
class Result
{
double min;
double max;
DateTime first; //needed for centre
DateTime centre;
//Important because this is what really defines the window range:
// this sample and everything 30s prior (or as determined by the InWindow predicate)
DateTime last;
//for fun, because once we have the others, these are easy and fast to do at the same time
double sum;
int count;
double avg;
}
// we also want to define our window range
// For this example, the head of the queue is still part of the range if it's within 30 seconds of the current sample
Func<Data, Data, bool> InWindow = (head, cur) => (head.Time.AddSeconds(30) >= cur.Time);
// and a place to accumulate our buffer (hurray for closures!)
var accBuffer = new Queue<Data>();
// now get the data
IEnumerable<Data> BufferData = ...;
// let's get to it!
var results = BufferData.Scan(new Result() {min = double.MaxValue, max = double.MinValue},
(acc, data) => {
//Use flags to avoid iterating the queue if possible
bool minmaxValid = true;
while (accBuffer.Count > 0 && !InWindow(accBuffer.Peek(), data)
{
var old = accBuffer.Dequeue();
acc.sum -= old.Value;
acc.count--;
//once an old min or max falls out of the window, we'll have to re-check the entire window :(
if (old.Value == acc.min) minmaxValid = false;
if (old.Value == acc.max) minmaxValid = false;
}
accBuffer.Enqueue(data);
acc.count++;
acc.sum += data.Value;
acc.first = accBuffer.Peek().Time;
acc.last = data.Time;
acc.centre = acc.First.AddTicks( (new TimeSpan(data.Time- acc.first)).Ticks / 2);
if (minmaxValid && data.Value < acc.min) acc.min = data.Value;
if (minmaxValid && data.Value > acc.max) acc.max = data.Value;
// have to check the whole queue :(
if (!minValid || !maxValid)
{
acc.min = double.MaxValue;
acc.max = double.MinValue;
//could use accBuffer.Max() and accBuffer.Min, but this avoids iterating the queue twice
foreach (var d in accBuffer)
{
if (d.Value < acc.min) acc.min = d.Value;
if (d.Value > acc.max) acc.max = d.Value;
}
}
acc.avg = acc.sum / acc.count;
});
这个解决方案的特殊之处在于它高度高效。它不是O(n) - 但它非常接近!
达到O(n)目标的剩余障碍是当(并且仅当)最大值或最小值落在窗口之外时需要迭代窗口队列。我认为不可能完全消除这种情况,但我觉得这里还有改进的余地,如果你能找到避免这种情况的方法。根据您需要保留的元素数量和每个元素的相对大小,您实际上可以通过使用某种排序算法做得更好......但我对此表示怀疑。在这种情况下,与总和,计数和平均值不同,最小值和最大值很难有效。
最后,我一开始并不知道,但多亏了@Enigmativity的回答,我现在看到我正在使用的Scan()运算符现已合并到an MS-maintained library中。通过NuGet使用它代替MoreLINQ几乎是一个替代品......我在这里发布的代码根本没有真正改变,只需在文件中获得正确的using
指令。
嗯......当我检查这个时,无论如何,MoreLINQ最近更新了,所以也许没关系。
答案 5 :(得分:0)
我不确定我完全理解所需的输出,但这是我拍摄的。
// Some mock data...
var data = new List<Sample>
{
new Sample { Time = new DateTime(2016, 1, 1, 0, 1, 00), Value = 10 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 2, 00), Value = 11 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 2, 20), Value = 17 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 2, 30), Value = 13 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 3, 00), Value = 18 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 3, 10), Value = 12 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 4, 00), Value = 19 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 4, 25), Value = 12 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 4, 55), Value = 11 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 5, 00), Value = 12 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 6, 00), Value = 14 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 8, 03), Value = 13 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 8, 44), Value = 17 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 9, 01), Value = 18 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 10, 32), Value = 19 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 10, 54), Value = 15 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 11, 00), Value = 10 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 11, 05), Value = 16 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 11, 10), Value = 14 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 11, 13), Value = 16 },
new Sample { Time = new DateTime(2016, 1, 1, 0, 11, 32), Value = 15 },
};
// The code...
var range = new TimeSpan(0, 0, 0, 30);
var results = data
.Select(sample => new
{
Time = sample.Time,
Set = data.Where(relatedSample => relatedSample.Time >= (sample.Time - range) && relatedSample.Time <= (sample.Time + range))
.Select(relatedSample => relatedSample.Value)
})
.Select(stat => new
{
Time = stat.Time,
Avg = stat.Set.Average(),
Min = stat.Set.Min(),
Max = stat.Set.Max(),
Count = stat.Set.Count()
});
这将返回包含每个样本的可枚举,以及前后30秒的最小值,最大值,平均值和样本计数。它可能不是最有效的方法,但它非常简单。它将样本的“窗口”检索到临时列表,然后对其执行统计。因此,至少它不会针对每个样本多次针对整个列表运行。可以肯定的是,每个窗口中都有很多样本。