Scala中的滚动时间窗口数据

时间:2018-10-25 05:40:30

标签: scala aggregate sliding-window

请在下面找到简化的scala代码片段,该片段会生成随机的 Day->数据映射,并尝试计算 3天的滚动时间窗口数据:-

val dataByDay: Map[String, String] = TreeMap((1 to 7).map(i => (s"Day$i" -> s"Data-$i")): _*)

val groupedIterator: Iterator[(Int, Map[String, String])] = dataByDay.sliding(3).zipWithIndex.map(e => ((e._2 + 1) -> e._1))

for ((day, lastFiveDaysDataOnEveryDay) <- groupedIterator) {
  println(s"On Day${day} data for days " + lastFiveDaysDataOnEveryDay.keys.mkString(",") + " will be used")
}

以上输出为:-

On Day1 data for days Day1,Day2,Day3 will be used
On Day2 data for days Day2,Day3,Day4 will be used
On Day3 data for days Day3,Day4,Day5 will be used
On Day4 data for days Day4,Day5,Day6 will be used
On Day5 data for days Day5,Day6,Day7 will be used

要求处理数据如下:-

On Day1 data for days will be used
On Day2 data for days Day1 will be used
On Day3 data for days Day1,Day2 will be used
On Day4 data for days Day1,Day2,Day3 will be used
On Day5 data for days Day2,Day3,Day4 will be used
On Day6 data for days Day3,Day4,Day5 will be used
On Day7 data for days Day4,Day5,Day6 will be used

请提出建议。

2 个答案:

答案 0 :(得分:2)

您的要求有点含糊。如果只需要该输出,那么一个简单的解决方案就是这样。

(1 to 7).foreach { day =>
  val prior = Seq(day-3,day-2,day-1).filter(_>0).map("Day" + _)
  println(s"On Day$day data for days${prior.mkString(",")} will be used")
}

如果要求是可配置滚动窗口的数据表示,则需要更精确的信息。

答案 1 :(得分:0)

我假设此代码仅用于解决此问题,而您的实际要求有所不同。

我正在为流提供解决方案,您可以使用与以下类似的方法为您的用例获得此特殊的窗口实现。

import scala.collection.mutable

val stream = {
  def loop(i: Int): Stream[(String, String)] = (s"Day$i", s"Data$i") #:: loop(i + 1)
  loop(1)
}

def specialWindowedStream[T](source: Stream[T], window: Int): Stream[List[T]] = {
  val queue = mutable.Queue.empty[T]
  def loop(source: Stream[T]): Stream[List[T]] = {
    queue.enqueue(source.head)
    if (queue.size > window) {
      queue.dequeue()
    }
    queue.toList #:: loop(source.tail)
  }

  loop(source)
}

val windowedStream = specialWindowedStream(stream, 5)

windowedStream.zipWithIndex.take(6).foreach(println)
// (List((Day1,Data1)),0)
// (List((Day1,Data1), (Day2,Data2)),1)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3)),2)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3), (Day4,Data4)),3)
// (List((Day1,Data1), (Day2,Data2), (Day3,Data3), (Day4,Data4),(Day5,Data5)),4)
// (List((Day2,Data2), (Day3,Data3), (Day4,Data4), (Day5,Data5),(Day6,Data6)),5)