我正在使用Spark 2.2
我觉得我这里有些奇怪。基本前提是
下面的代码
String txt = "Some formatted string! \r\n\tLook there should be a new line
here!\r\n\r\n\tAndthere should be 2 new lines here!"
m_wordprocessingDocument.MainDocumentPart.Document.Body.AppendChild(
new Paragraph(
new Run(
new Text(txt))));
(我已经清理/缩短了下面的ID,它们更大,但格式相似)
我看到的是输出
Output#1有一组uId(作为String)出现
implicit val mapEncoder = Encoders.kryo[java.util.HashMap[String, Any]]
implicit val recommendationEncoder = Encoders.kryo[Recommendation]
val mapper = new ObjectMapper()
val kieOuts = uberDs.map(profile => {
val map = mapper.convertValue(profile, classOf[java.util.HashMap[String, Any]])
val profile = Profile(map)
// setup the kie session
val ks = KieServices.Factory.get
val kContainer = ks.getKieClasspathContainer
val kSession = kContainer.newKieSession() //TODO: stateful session, how to do stateless?
// insert profile object into kie session
val kCmds = ks.getCommands
val cmds = new java.util.ArrayList[Command[_]]()
cmds.add(kCmds.newInsert(profile))
cmds.add(kCmds.newFireAllRules("outFired"))
// fire kie rules
val results = kSession.execute(kCmds.newBatchExecution(cmds))
val fired = results.getValue("outFired").toString.toInt
// collect the inserted recommendation objects and create uid string
import scala.collection.JavaConversions._
var gresults = kSession.getObjects
gresults = gresults.drop(1) // drop the inserted profile object which also gets collected
val recommendations = scala.collection.mutable.ListBuffer[Recommendation]()
gresults.toList.foreach(reco => {
val recommendation = reco.asInstanceOf[Recommendation]
recommendations += recommendation
})
kSession.dispose
val uIds = StringBuilder.newBuilder
if(recommendations.size > 0) {
recommendations.foreach(recommendation => {
uIds.append(recommendation.getOfferId + "_" + recommendation.getScore)
uIds.append(";")
})
uIds.deleteCharAt(uIds.size - 1)
}
new ORecommendation(profile.getAttributes().get("cId").toString.toLong, fired, uIds.toString)
})
println("======================Output#1======================")
kieOuts.show(1000, false)
println("======================Output#2======================")
kieOuts.collect.foreach(println)
//separating cid and and each uid into individual rows
val kieOutsDs = kieOuts.as[(Long, Int, String)]
println("======================Output#3======================")
kieOutsDs.show(1000, false)
Output#2会显示一组相似的uId(通常只有1个元素)
+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|cId |rulesFired | eligibleUIds |
|842 | 17|123-25_2.0;12345678-48_9.0;28a-ad_5.0;123-56_10.0;123-27_2.0;123-32_3.0;c6d-e5_5.0;123-26_2.0;123-51_10.0;8e8-c1_5.0;123-24_2.0;df8-ad_5.0;123-36_5.0;123-16_2.0;123-34_3.0|
+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Output#3与#Output1
ORecommendation(842,17,123-36_5.0;123-24_2.0;8e8-c1_5.0;df8-ad_5.0;28a-ad_5.0;660-73_5.0;123-34_3.0;123-48_9.0;123-16_2.0;123-51_10.0;123-26_2.0;c6d-e5_5.0;123-25_2.0;123-56_10.0;123-32_3.0)
每次运行它时,Output#1和Output#2之间的区别是1个元素,但永远不会是同一元素(在上面的示例中,Output#1具有 123-27_2.0 ,但输出#2具有 660-73_5.0 )
它们应该不一样吗?我还是Scala / Spark的新手,感觉好像我缺少了一些非常基本的东西
答案 0 :(得分:0)
我想我明白了,向kieOuts添加 cache 至少使我在show和collect之间具有相同的输出。 我将研究为什么KIE每次运行相同的输入都会给我不同的输出,但这是一个不同的问题