Spark:显示和收集println提供不同的输出

时间:2018-08-07 13:47:09

标签: scala apache-spark apache-spark-2.0 kie

我正在使用Spark 2.2

我觉得我这里有些奇怪。基本前提是

  • 我有一组通过个人资料对象数据集运行的KIE / Drools规则
  • 然后我试图显示/收集打印出的结果
  • 然后我将输出转换为元组,以便以后对其进行平面映射

下面的代码

String txt = "Some formatted string! \r\n\tLook there should be a new line 
here!\r\n\r\n\tAndthere should be 2 new lines here!"
m_wordprocessingDocument.MainDocumentPart.Document.Body.AppendChild(
        new Paragraph(
            new Run(
                new Text(txt))));

(我已经清理/缩短了下面的ID,它们更大,但格式相似)

我看到的是输出

Output#1有一组uId(作为String)出现

implicit val mapEncoder = Encoders.kryo[java.util.HashMap[String, Any]]
implicit val recommendationEncoder = Encoders.kryo[Recommendation]
val mapper = new ObjectMapper()

val kieOuts = uberDs.map(profile => {
  val map = mapper.convertValue(profile, classOf[java.util.HashMap[String, Any]])
  val profile = Profile(map)

  // setup the kie session
  val ks = KieServices.Factory.get
  val kContainer = ks.getKieClasspathContainer
  val kSession = kContainer.newKieSession() //TODO: stateful session, how to do stateless?

  // insert profile object into kie session
  val kCmds = ks.getCommands
  val cmds = new java.util.ArrayList[Command[_]]()
  cmds.add(kCmds.newInsert(profile))
  cmds.add(kCmds.newFireAllRules("outFired"))

  // fire kie rules
  val results = kSession.execute(kCmds.newBatchExecution(cmds))
  val fired = results.getValue("outFired").toString.toInt

  // collect the inserted recommendation objects and create uid string
  import scala.collection.JavaConversions._
  var gresults = kSession.getObjects
  gresults = gresults.drop(1) // drop the inserted profile object which also gets collected

  val recommendations = scala.collection.mutable.ListBuffer[Recommendation]()
  gresults.toList.foreach(reco => {
    val recommendation = reco.asInstanceOf[Recommendation]
    recommendations += recommendation
  })
  kSession.dispose
  val uIds = StringBuilder.newBuilder
  if(recommendations.size > 0) {
    recommendations.foreach(recommendation => {
      uIds.append(recommendation.getOfferId + "_" + recommendation.getScore)
      uIds.append(";")
    })
    uIds.deleteCharAt(uIds.size - 1)
  }

  new ORecommendation(profile.getAttributes().get("cId").toString.toLong, fired, uIds.toString)
})
println("======================Output#1======================")
kieOuts.show(1000, false)
println("======================Output#2======================")
kieOuts.collect.foreach(println)

//separating cid and and each uid into individual rows
val kieOutsDs = kieOuts.as[(Long, Int, String)]
println("======================Output#3======================")
kieOutsDs.show(1000, false)

Output#2会显示一组相似的uId(通常只有1个元素)

+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|cId |rulesFired |    eligibleUIds   |
|842 |         17|123-25_2.0;12345678-48_9.0;28a-ad_5.0;123-56_10.0;123-27_2.0;123-32_3.0;c6d-e5_5.0;123-26_2.0;123-51_10.0;8e8-c1_5.0;123-24_2.0;df8-ad_5.0;123-36_5.0;123-16_2.0;123-34_3.0|
+----+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Output#3与#Output1

ORecommendation(842,17,123-36_5.0;123-24_2.0;8e8-c1_5.0;df8-ad_5.0;28a-ad_5.0;660-73_5.0;123-34_3.0;123-48_9.0;123-16_2.0;123-51_10.0;123-26_2.0;c6d-e5_5.0;123-25_2.0;123-56_10.0;123-32_3.0)
  • 每次运行它时,Output#1和Output#2之间的区别是1个元素,但永远不会是同一元素(在上面的示例中,Output#1具有 123-27_2.0 ,但输出#2具有 660-73_5.0

  • 它们应该不一样吗?我还是Scala / Spark的新手,感觉好像我缺少了一些非常基本的东西

1 个答案:

答案 0 :(得分:0)

我想我明白了,向kieOuts添加 cache 至少使我在show和collect之间具有相同的输出。 我将研究为什么KIE每次运行相同的输入都会给我不同的输出,但这是一个不同的问题