Graphx从vertexID查找vertexLabel

时间:2016-07-20 09:00:41

标签: scala apache-spark hash lookup spark-graphx

我已经从apache spark graphx创建了一个图形,顶点形式为
(vertexId,vertexLabel)

graph.vertices.take(5)
(73607571123990017,157.55.145.210)
(-8476294060085646488,65.55.116.184)
(-1290863642671546500,184.73.235.12)
(4333023396065188982,63.91.215.17)
(-8653425046038876102,23.62.195.78)

我已经计算了来自顶点的单源最短路径,其语法和输出如下

(dstID,(length,List(whole path))

sssp.vertices.take(5)
(-912545243459764830,(3,List(223277346867836574, -7175187973700249964, 3342971904799511809, -912545243459764830)))
(2186653685768931954,(1000,List()))
(-5644725372565726221,(1000,List()))
(4398516124184853312,(3,List(223277346867836574, -7175187973700249964, 3342971904799511809, 4398516124184853312)))
(-7175187973700249964,(1,List(223277346867836574, -7175187973700249964)))

我想从vertexId中查找vertexLabel(例如157.55.145.210)(例如73607571123990017),sssp.vertices.take(5)的输出如下所示

(145.22.33.456,(3,List(155.22.32.938, 185.42.53.756, 105.62.83.956, 125.26.73.656)))

我尝试过像这样的东西,但它只能使用一个顶点而不是sssp.vertices.take(5)的整个输出

graph.vertices.filter{case(id, _) => id==223277346867836574L}.collect

以上述方式输出最短路径的方法应该是什么?

1 个答案:

答案 0 :(得分:0)

您必须使用原始图表执行连接才能取回标签。 我将所有路径映射到一个巨型RDD并进行单个连接以获取标签。具有空路径的顶点应该与另一个连接分开处理。

val pathsRDD = sssp.vertices.values.flatMap { case (_, vertices) =>
  if (vertices.isEmpty) {
    Seq.Empty
  } else {
    val dest = vertices.last
    // Store an index so we can reconstruct the list in the correct order
    vertices.zipWithIndex.map { case (v, index) => 
      (v, (dest, index))
    }
  }
}

pathsRDD.join(graph.vertices).map { case (vertex, ((dest, index), label)) =>
  (dest, (label, index)) 
}.groupByKey.map { case (dest, iter) =>
  val out = iter.toList.sortBy(_._2).map(_._1)
  (out.last, (out.size, out))
}