Spark GraphX - 如何传递和数组来过滤图形边缘?

时间:2017-04-03 05:05:04

标签: arrays scala apache-spark spark-graphx

我在Spark 2.1.0 GraphX上使用Scala。我有一个如下所示的数组:

scala> TEMP1Vertex.take(5)
res46: Array[org.apache.spark.graphx.VertexId] = Array(-1895512637, -1745667420, -1448961741, -1352361520, -1286348803)

如果我必须过滤边缘表格中的单个值,请说出soruce ID -1895512637

val TEMP1Edge = graph.edges.filter { case Edge(src, dst, prop) => src == -1895512637}

scala> TEMP1Edge.take(5)
res52: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(-1895512637,-2105158920,89), Edge(-1895512637,-2020727043,3), Edge(-1895512637,-1963423298,449), Edge(-1895512637,-1855207100,214), Edge(-1895512637,-1852287689,339))

scala> TEMP1Edge.count
17/04/03 10:20:31 WARN Executor: 1 block locks were not released by TID = 1436:[rdd_36_2]
res53: Long = 126

但是当我传递一个包含一组唯一源ID的数组时,代码会成功运行,但它不会返回任何值,如下所示:

scala> val TEMP1Edge = graph.edges.filter { case Edge(src, dst, prop) => src == TEMP1Vertex}
TEMP1Edge: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[Int]] = MapPartitionsRDD[929] at filter at <console>:56

scala> TEMP1Edge.take(5)
17/04/03 10:29:07 WARN Executor: 1 block locks were not released by TID = 1471:
[rdd_36_5]
res60: Array[org.apache.spark.graphx.Edge[Int]] = Array()

scala> TEMP1Edge.count
17/04/03 10:29:10 WARN Executor: 1 block locks were not released by TID = 1477:
[rdd_36_5]
res61: Long = 0

1 个答案:

答案 0 :(得分:2)

我认为instance Functor (Arr2 e1 e2) where fmap g (Arr2 a) = Arr2 ((g .) . a)的类型为TEMP1Vertex,所以我认为您的代码应该是这样的:

Array[VertexId]