如何根据SparkStreaming-Kafka消息使用特定的ElasticSearch查询

时间:2019-03-25 03:59:53

标签: apache-spark elasticsearch spark-streaming spark-streaming-kafka

我正在使用SparkStreaming-Kafka,并希望通过特定查询(来自Kafka消息)支持ElasticSearch实时搜索。
像下面这样的代码:

def creatingFuncTest():StreamingContext={

  val ssc = new StreamingContext(sc, Seconds(duration.toInt))
  ssc.checkpoint(checkpointDir)
  val kafkaParams = KafkaUtil.getKafkaParam(brokers, appName)

  val topics = actionTopicList.split(",").toSet

  val foodMessages = KafkaUtils
    .createDirectStream[String, String, StringDecoder, StringDecoder](
    ssc,
    kafkaParams,
    topics
  )


  val foodBatch: DStream[(String,Float, Float)] =
    foodMessages
      .filter(_._2.nonEmpty)
      .map { msg =>
        try {
          println("___________ msg :" + msg._2)
          val gson = new Gson()
          val vo = gson.fromJson(msg._2, classOf[PoiMsg])
          (vo.person_id.toString, vo.latitude.toFloat,vo.longitude.toFloat)
        } catch {
          case e: Exception =>
            println("____________" + e.getMessage)
            ("", 0.0f,0.0f)
        }
      }
      .filter(_._1.nonEmpty)



  foodBatch.foreachRDD(row =>{
    row.foreach(t =>{
      var lat = t._2
      var lon = t._3


      val query:String =s"""{
                                    "filter" : {
                                        "geo_distance" : {//...
                                            "distance" : "200km",
                                            "pin.location" : {"lat" : "{$lat}", "lon" : "{#lon}" } }
                                    }
                }"""

      val rdd = sc.esRDD("recommend_diet_menu/fooddocument", query)

      println(rdd.count())

    })
  })
  ssc

}

我知道在RDD中,生成新的RDD是错误的,但是正确的方法是什么?

0 个答案:

没有答案