在循环

时间:2017-08-07 20:38:41

标签: scala apache-spark immutability rdd var

我使用Spark RDD以两种方式为未加权图实现了某种breadth first search

1.While loop

import org.apache.spark.SparkContext

object StrangeBFS {
  def main(args: Array[String]): Unit = {
    val sc = new SparkContext("local[*]", "StrangeBFS")
    val raw = sc.textFile(args(0))

    val WHITE = "WHITE"//for unexplored vertexes
    val GRAY = "GRAY"//for currently explored vertexes
    val BLACK = "BLACK"//for explored vertexes

    val INF = -1

    case class BFSNode(id: String, idList: List[String], w: Int, color: String)

    val whiteGraph = raw.map(s => {
      val ss = s.split(" ").toList
      BFSNode(id = ss.head, idList = ss.tail, w = INF, color = WHITE)
    })

    val exploredId = args(1)

    var graph = whiteGraph.map(e => {
      if (e.id == exploredId) {
        e.copy(color = GRAY)
      } else {
        e
      }
    })

    var result: Map[String, Int] = Map.empty//(vertex, weight)

    def evalGrayList() = graph.filter(_.color == GRAY).collect().toList

    var grayList = evalGrayList()

    def graph2Str(): String = {
      s"${System.lineSeparator()}${graph.collect().toList.mkString(System.lineSeparator())}"
    }

    while (grayList.nonEmpty) {
      graph = graph.map(e => {
        if (grayList.map(_.id).contains(e.id)) {
          e.copy(w = grayList.filter(_.id == e.id).head.w + 1, color = BLACK)
        } else {
          e
        }
      })

      val blacked = graph.filter(e => grayList.map(_.id).contains(e.id)).collect().toList//bfs nodes
      val id2W = blacked.map(e => (e.idList, e.w)).flatMap(e => e._1.map(s => s -> e._2)).toMap

      //bcz all edges have the same weight 1
      result = result ++ blacked.map(e => e.id -> e.w).toMap

      graph = graph.map(e => {
        if (id2W.keys.toSet.contains(e.id) && e.color == WHITE) {
          e.copy(w = id2W(e.id), color = GRAY)
        } else {
          e
        }
      })

      //look here!!!
      println(s"before filtering: ${graph2Str()}")

      grayList = evalGrayList()

      println(s"after filtering: ${graph2Str()}")
    }

    println(s"result: ${System.lineSeparator()}")
    result.foreach(println)

    sc.stop()
  }
}

2.Tail recursion 没有任何vars

如果感兴趣,请参阅代码:https://github.com/DedkovVA/CodeExamples/blob/master/src/main/scala/BFS.scala

所以,方式2正常工作

我在文件中尝试了这么少的数据:

1 2 3 4
2 1 3
3 1 2 4
4 1 3

此处的每一行代表顶点(第一个数字)和相邻列表

但我看到相同数据的方式1的奇怪输出(我用id =“1”探索了顶点):

before filtering: 
BFSNode(1,List(2, 3, 4),0,BLACK)
BFSNode(2,List(1, 3),0,GRAY)
BFSNode(3,List(1, 2, 4),0,GRAY)
BFSNode(4,List(1, 3),0,GRAY)

after filtering: 
BFSNode(1,List(2, 3, 4),-1,GRAY)
BFSNode(2,List(1, 3),1,BLACK)
BFSNode(3,List(1, 2, 4),1,BLACK)
BFSNode(4,List(1, 3),1,BLACK)

最后经过几次迭代我得到了错误:

17/08/07 22:28:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 1890,5,run-main-group-0]
java.lang.StackOverflowError
        at java.io.ObjectStreamClass.setPrimFieldValues(ObjectStreamClass.java:1287)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2009)

看起来好像我在while循环中有一些代码重新排列或者var RDD graph上的某些重新转换。

我在本地使用Scala 2.11.8和Spark 2.2.0测试了它 请有人解释这种奇怪的输出

0 个答案:

没有答案