独立的火花:工人没有出现

时间:2015-04-26 08:10:46

标签: scala intellij-idea apache-spark

我有2个问题想知道:

这是我的代码:

object Hi {
  def  main (args: Array[String]) {
    println("Sucess")
    val conf = new SparkConf().setAppName("HI").setMaster("local")
    val sc = new SparkContext(conf)
    val textFile = sc.textFile("src/main/scala/source.txt")
    val rows = textFile.map { line =>
      val fields = line.split("::")
      (fields(0), fields(1).toInt)
    }
    val x = rows.map{case (range , ratednum) => range}.collect.mkString("::")
    val y = rows.map{case (range , ratednum) => ratednum}.collect.mkString("::")
    println(x)
    println(y)
    println("Sucess2")

  }
}

以下是一些故事:

15/04/26 16:49:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/04/26 16:49:57 INFO SparkUI: Started SparkUI at http://192.168.1.105:4040
15/04/26 16:49:57 INFO Executor: Starting executor ID <driver> on host localhost
15/04/26 16:49:57 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.105:64952/user/HeartbeatReceiver
15/04/26 16:49:57 INFO NettyBlockTransferService: Server created on 64954
15/04/26 16:49:57 INFO BlockManagerMaster: Trying to register BlockManager
15/04/26 16:49:57 INFO BlockManagerMasterActor: Registering block manager localhost:64954 with 983.1 MB RAM, BlockManagerId(<driver>, localhost, 64954)
.....
15/04/26 16:49:59 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839
15/04/26 16:49:59 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MapPartitionsRDD[4] at map at Hi.scala:25)
15/04/26 16:49:59 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/04/26 16:49:59 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1331 bytes)
15/04/26 16:49:59 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/04/26 16:49:59 INFO HadoopRDD: Input split: file:/Users/Winsome/IdeaProjects/untitled/src/main/scala/source.txt:0+23
15/04/26 16:49:59 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1787 bytes result sent to driver
15/04/26 16:49:59 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 13 ms on localhost (1/1)
15/04/26 16:49:59 INFO DAGScheduler: Stage 1 (collect at Hi.scala:25) finished in 0.013 s
15/04/26 16:49:59 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/04/26 16:49:59 INFO DAGScheduler: Job 1 finished: collect at Hi.scala:25, took 0.027784 s
1~1::2~2::3~3
10::20::30
Sucess2

我的第一个问题是:当我检查http://localhost:8080/时 没有工人。我也无法打开http://192.168.1.105:4040 是因为我独立使用火花吗? 怎么修这个?

(我的环境是MAC,IDE是Intellij)

enter image description here

我的第二个问题是:

    val x = rows.map{case (range , ratednum) => range}.collect.mkString("::")
    val y = rows.map{case (range , ratednum) => ratednum}.collect.mkString("::")
    println(x)
    println(y)

我认为这些代码可能更容易获得x和y(类似这样的东西:rows[range]rows[ratenum]),但我不熟悉scala。 你能给我一些建议吗?

1 个答案:

答案 0 :(得分:0)

我不确定你的第一个问题,但是阅读你的日志我看到工作节点持续了13毫秒,所以这可能就是你没有看到它的原因。做更长的工作,你可能会看到工人。

关于第二个问题,是的,有一种更简单的方式来编写它:

val x = rows.map{(tuple) => tuple._1}.collect.mkString("::")

因为您的RDDTuple个scala对象组成,这些对象由两个字段组成,您可以分别使用_1_2访问这些字段。