GraphX pregel和spark streaming:不会处理推入vprog中的rddQueue的RDD

时间:2017-08-18 13:20:57

标签: apache-spark spark-streaming spark-graphx

我正在使用GraphX预凝胶和火花流。我希望顶点程序(vprog)创建一个RDD并将其推送到rddQueue进行处理。

val queueOfRDDs:Queue[RDD[Int]] = Queue.empty[RDD[Int]]        
@transient val streamingContext:StreamingContext = new StreamingContext(sc, Seconds(1))    
val inputDStream = streamingContext.queueStream(queueOfRDDs,true,null)
inputDStream.map(x => (x % 10, 1)).reduceByKey(_ + _).print()
streamingContext.start()

val initialMessage = "init"

def vertexProgram(id: VertexId, attr: String, msgs: String): String =
  {
    queueOfRDDs.synchronized {
      for(a <- 1 to 3) {
        queueOfRDDs.+=sc.makeRDD(1 to 1000, 10)
        println("will add " + queueOfRDDs.size)
      }
    }
    msgs
  }

  def sendMessage(...){...}
  def messageCombiner(...){...}
  val newGraph = Pregel.apply(graph,initialMessage,1,EdgeDirection.Out)(vertexProgram,sendMessage,messageCombiner)

预期结果是:

will add1
    will add2
    will add3
    will add4
    will add5
    will add6
    will add7
    ...
    -------------------------------------------
    Time: 1503048820000 ms
    -------------------------------------------
    (0,100)
    (6,100)
    (3,100)
    (9,100)
    (4,100)
    (1,100)
    (7,100)
    (8,100)
    (5,100)
    (2,100)

    -------------------------------------------
    Time: 1503048820000 ms
    -------------------------------------------
    (0,100)
    (6,100)
    (3,100)
    (9,100)
    (4,100)
    (1,100)
    (7,100)
    (8,100)
    (5,100)
    (2,100)

    ...

    -------------------------------------------
    Time: 1503048820000 ms
    -------------------------------------------
    (0,100)
    (6,100)
    (3,100)
    (9,100)
    (4,100)
    (1,100)
    (7,100)
    (8,100)
    (5,100)
    (2,100)

但我得到了这个结果:

    will add1
    will add2
    will add3
    will add4
    will add5
    will add6
    will add7
    ...

RDD被推入queueOfRDDs(其大小增加),但它们未被处理。 你能帮帮我吗

1 个答案:

答案 0 :(得分:0)

TL; DR :这不起作用。

此代码看起来不正确。看起来您正在尝试从任务(vertexProgram)内创建初始化RDD,可能是通过使SparkContext延迟或使用对象包装器。

您的程序附加到Queue的本地副本,该副本对于实际的驱动程序是不可见的。即使它是RDDs也会对应不同的背景。