我有这样的消息来源:
1 | red | light | 10
2 | blue | dark | 20
1 | brown | light | 2
1 | red | light | 10
20 | grey | dark | 200
我想找出(true / false
)源中是否有相同的项目。在上面的流中,1 | red | light | 10
是相同的。超过200万条记录,此流可能非常大。找到相同的项目后,我可以立即返回true
(即,在上面的示例中,我们可以避免读取20 | grey | dark | 200
)。
做到这一点的最佳方法是什么?我尝试将整个源代码读入List(String)
并在其上进行不同的处理。这样做行得通,但是,对于大型源,我开始收到OOM错误。
val restResult: Future[immutable.Seq[Color]] =
mySource(ctx)
.drop(1)
.via(framing("\n"))
.map(_.utf8String)
.map(_.trim)
.map(s => ColorParser(s))
.collect {
case Right(color) => color
}
.runWith(Sink.seq)
答案 0 :(得分:0)
这是检查相同颜色列表的示例:
case class Color(n: String, name: String, lightOrDark: String, n2: String)
val k1 = Color("1", "red", "light", "10")
val k2 = Color("1", "blue", "dark", "11")
val k3 = Color("1", "orange", "dark", "11")
val k4 = Color("1", "red", "light", "10")
val k5 = Color("1", "red", "dark", "200")
println(k1.hashCode() == k2.hashCode())
println(k1.hashCode() == k4.hashCode())
val set = mutable.Set.empty[Int]
val colorList = List(k1, k2, k3, k4, k5)
val restResult =
Source
.fromIterator(colorList.iterator _)
.map { color =>
val hashCode = color.hashCode()
val res = !set.contains(hashCode)
set += hashCode
res
}.takeWhile(identity, inclusive = true)
.runWith(Sink.last)
restResult.onComplete {
case Success(value) =>
println(value)
system.terminate()
case Failure(e) =>
e.printStackTrace()
system.terminate()
}
整个来源是here