我使用Spark RDD以两种方式为未加权图实现了某种breadth first search
:
1.While loop
import org.apache.spark.SparkContext
object StrangeBFS {
def main(args: Array[String]): Unit = {
val sc = new SparkContext("local[*]", "StrangeBFS")
val raw = sc.textFile(args(0))
val WHITE = "WHITE"//for unexplored vertexes
val GRAY = "GRAY"//for currently explored vertexes
val BLACK = "BLACK"//for explored vertexes
val INF = -1
case class BFSNode(id: String, idList: List[String], w: Int, color: String)
val whiteGraph = raw.map(s => {
val ss = s.split(" ").toList
BFSNode(id = ss.head, idList = ss.tail, w = INF, color = WHITE)
})
val exploredId = args(1)
var graph = whiteGraph.map(e => {
if (e.id == exploredId) {
e.copy(color = GRAY)
} else {
e
}
})
var result: Map[String, Int] = Map.empty//(vertex, weight)
def evalGrayList() = graph.filter(_.color == GRAY).collect().toList
var grayList = evalGrayList()
def graph2Str(): String = {
s"${System.lineSeparator()}${graph.collect().toList.mkString(System.lineSeparator())}"
}
while (grayList.nonEmpty) {
graph = graph.map(e => {
if (grayList.map(_.id).contains(e.id)) {
e.copy(w = grayList.filter(_.id == e.id).head.w + 1, color = BLACK)
} else {
e
}
})
val blacked = graph.filter(e => grayList.map(_.id).contains(e.id)).collect().toList//bfs nodes
val id2W = blacked.map(e => (e.idList, e.w)).flatMap(e => e._1.map(s => s -> e._2)).toMap
//bcz all edges have the same weight 1
result = result ++ blacked.map(e => e.id -> e.w).toMap
graph = graph.map(e => {
if (id2W.keys.toSet.contains(e.id) && e.color == WHITE) {
e.copy(w = id2W(e.id), color = GRAY)
} else {
e
}
})
//look here!!!
println(s"before filtering: ${graph2Str()}")
grayList = evalGrayList()
println(s"after filtering: ${graph2Str()}")
}
println(s"result: ${System.lineSeparator()}")
result.foreach(println)
sc.stop()
}
}
2.Tail recursion 没有任何vars
如果感兴趣,请参阅代码:https://github.com/DedkovVA/CodeExamples/blob/master/src/main/scala/BFS.scala
所以,方式2正常工作
我在文件中尝试了这么少的数据:
1 2 3 4
2 1 3
3 1 2 4
4 1 3
此处的每一行代表顶点(第一个数字)和相邻列表
但我看到相同数据的方式1的奇怪输出(我用id =“1”探索了顶点):
before filtering:
BFSNode(1,List(2, 3, 4),0,BLACK)
BFSNode(2,List(1, 3),0,GRAY)
BFSNode(3,List(1, 2, 4),0,GRAY)
BFSNode(4,List(1, 3),0,GRAY)
after filtering:
BFSNode(1,List(2, 3, 4),-1,GRAY)
BFSNode(2,List(1, 3),1,BLACK)
BFSNode(3,List(1, 2, 4),1,BLACK)
BFSNode(4,List(1, 3),1,BLACK)
最后经过几次迭代我得到了错误:
17/08/07 22:28:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 1890,5,run-main-group-0]
java.lang.StackOverflowError
at java.io.ObjectStreamClass.setPrimFieldValues(ObjectStreamClass.java:1287)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2009)
看起来好像我在while循环中有一些代码重新排列或者var
RDD graph
上的某些重新转换。
我在本地使用Scala 2.11.8和Spark 2.2.0测试了它 请有人解释这种奇怪的输出