Scala - Spark:返回特定节点的顶点属性

时间:2017-09-25 15:51:12

标签: scala graph spark-graphx

我有一个图表,我想计算最大程度。特别是具有最大度数的顶点我想知道所有属性。 这是代码片段:

def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
    if (a._2 > b._2) a else b
} 

val maxDegrees : (VertexId, Int) = graphX.degrees.reduce(max)
max: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))(org.apache.spark.graphx.VertexId, Int) 
maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (2063726182,56387)

val startVertexRDD = graphX.vertices.filter{case (hash_id, (id, state)) => hash_id == maxDegrees._1}
startVertexRDD.collect()

但它返回了这个例外:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 1 times, most recent failure: Lost task 0.0 in stage 145.0 (TID 5380, localhost, executor driver): scala.MatchError: (1009147972,null) (of class scala.Tuple2)

如何解决?

1 个答案:

答案 0 :(得分:2)

我认为这是问题所在。在这里:

val startVertexRDD = graphX.vertices.filter{case (hash_id, (id, state)) => hash_id == maxDegrees._1}

所以它试图比较像这样的一些元组

(2063726182,56387)

期待这样的事情:

(hash_id, (id, state))

引发scala.MatchError是因为将(VertextId,Int)的Tuple2与(VertexId,Tuple2(id,state))的Tuple2进行比较

同样要小心:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 1 times, most recent failure: Lost task 0.0 in stage 145.0 (TID 5380, localhost, executor driver): scala.MatchError: (1009147972,null) (of class scala.Tuple2)

具体地说:

scala.MatchError: (1009147972,null)

没有为顶点1009147972计算度数,所以当它进行比较时也会引发一些问题。

希望这会有所帮助。