我们有火花流应用程序,我们消耗来自Kafka的事件..我们希望在每个事件中通过traceid聚合事件一段时间并为该traceid创建聚合事件并编写聚合事件进入数据库
我们的活动就像这样
traceid: 123
{
info: abc;
}
traceid: 123
{
info:bcd;
}
现在我们想要实现的是在一段时间内创建一个聚合事件,比如说2分钟,然后将聚合事件写入数据库而不是单个事件
traceid: 123
{
info:abc,bcd
}
我们使用 mapwithState 并想出了这段代码
def trackStateFunc(batchTime: Time, id: String, url: Option[MetricTypes.EnrichedKeyType], state: State[SessionData]): Option[(String, String, Long, immutable.Map[String, String])] = {
val enrichedId = id
var accountId:String = null
var reducedText:String = null
var commonIDS:String = null
var deviceId:String = null
var ets:Long = 0
var eventId:String = null
if (url.isDefined) {
accountId = url.get._1.asInstanceOf[String]
reducedText = url.get._2.asInstanceOf[String]
commonIDS = url.get._3.asInstanceOf[String]
deviceId = url.get._4.asInstanceOf[String]
ets = url.get._5.toString.toLong
eventId = url.get._6.asInstanceOf[String]
val attributeMap = Map(
eventId -> reducedText,
"common_ids" -> commonIDS,
"common_enriched_physicalDeviceId" -> deviceId
)
if (state.exists) {
val newState = state.get.attributeMap ++ attributeMap
state.update(SessionData(newState))
Some(accountId, enrichedId, ets, newState)
} else {
state.update(SessionData(attributeMap))
Some(accountId, enrichedId, ets, attributeMap)
}
}
else {
None
}
}
val stateSpec = StateSpec.function(trackStateFunc _).timeout(Minutes(2)).
val requestsWithState = tempLines.mapWithState(stateSpec)
requestsWithState.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
val connection = createNewConnection()
partitionOfRecords.foreach(record => { record match {
case (accountId, enrichedId, ets, attributeMap) =>
if (validateRecordForStorage(accountId, enrichedId, ets, attributeMap)) {
val ds = new DBDataStore(connection)
ds.saveEnrichedEvent(accountId, enrichedId, ets, attributeMap)
//val r = scala.util.Random
} else {
/*logError("Discarded record [enrichedId=" + enrichedId
+ ", accountId=" + accountId
+ ", ets=" + ets
+ ", attributes=" + attributeMap.toString() + "]")*/
println("Discarded record [enrichedId=" + enrichedId
+ ", accountId=" + accountId
+ ", ets=" + ets
+ "]")
null
}
case default => {
logInfo("You gave me: " + default)
null
}
}
}
)
}
}
mapwithState聚合很好......但是我们的理解是..它应该在2分钟之后才开始写入数据库 但是注意到开始&#39 ; s立即写入数据库而不等待2分钟 .....所以我们的理解是不正确的,如果有人可以指导我们实现我们的目标,只有在聚合2分钟后才能写入数据库 < / strong>会有很大的帮助