Spark Streaming将查询参数传递给Neo4j Scala

时间:2018-05-02 09:38:30

标签: apache-spark neo4j cypher spark-streaming

我正在尝试使用Spark-Neo4j连接器执行Cypher查询。我想从Kafka生成的数据流中将参数传递给此查询。并且Cypher查询的结果应显示为数据帧字段。与Neo4j的连接已成功建立,我的查询与简单的spark上下文一起工作正常。但是,相同的代码不适用于流上下文。使用Spark Streaming时,Neo4j连接配置有什么不同吗?

以下是流式上下文的代码。我不在这里使用Kafka作为生产者,参数数据在数据数组中定义,用于测试连接和查询本身:

val sparkSession = SparkSession
      .builder()
      .appName("KafkaSparkStreaming")
      .master("local[*]")
      .getOrCreate()

    val neo4jLocalConfig = ConfigFactory.parseFile(new File("configs/local_neo4j.conf"))

    sparkSession.conf.set("spark.neo4j.bolt.url", neo4jLocalConfig.getString("neo4j.url"))
    sparkSession.conf.set("spark.neo4j.bolt.user", neo4jLocalConfig.getString("neo4j.user"))
    sparkSession.conf.set("spark.neo4j.bolt.password", neo4jLocalConfig.getString("neo4j.password"))

    val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(3))

    val neo = Neo4j(streamingContext.sparkContext)
    val data = Array("18731", "41.84000015258789", "-87.62999725341797")

    val query = "MATCH (m:Member)-[mtg_r:MT_TO_MEMBER]->(mt:MemberTopics)-[mtt_r:MT_TO_TOPIC]->(t:Topic), (t1:Topic)-[tt_r:GT_TO_TOPIC]->(gt:GroupTopics)-[tg_r:GT_TO_GROUP]->(g:Group)-[h_r:HAS]->(e:Event)-[a_r:AT]->(v:Venue) WHERE mt.topic_id = gt.topic_id AND distance(point({ longitude: {lon}, latitude: {lat}}),point({ longitude: v.lon, latitude: v.lat })) < 4000 AND mt.member_id = {id} RETURN distinct g.group_name as group_name, e.event_name as event_name, v.venue_name as venue_name"



    val paramsMap = Map("lat" -> data(1).toDouble, "lon" -> data(2).toDouble, "id" -> data(0).toInt)

    val df = neo.cypher(query, paramsMap).loadDataFrame("group_name" -> "string", "event_name" -> "string", "venue_name" -> "string")
    df.show()

    streamingContext.start()
    streamingContext.awaitTermination()

1 个答案:

答案 0 :(得分:0)

我通过向SparkSfion提供Neo4j所需参数的SparkConfig解决了这个问题。这是代码:

val config = "neo4j_local"
    val sparkConf = new SparkConf().setMaster("local[*]").setAppName("KafkaSparkStreaming")
    sparkConf.set("spark.neo4j.bolt.url", neo4jLocalConfig.getString("neo4j.url"))
        sparkConf.set("spark.neo4j.bolt.user", neo4jLocalConfig.getString("neo4j.user"))

    val sparkSession = SparkSession
      .builder()
      .config(sparkConf)
      .getOrCreate()