Question

我正在尝试使用Scala中的IntelliJ IDEA执行这个简单的Spark作业。但是，完成执行对象后，Spark UI将完全停止。有什么东西我缺少或在错误的位置听吗？ Scala版本 - 2.10.4和Spark - 1.6.0

import org.apache.spark.{SparkConf, SparkContext}
    object SimpleApp {
      def main(args: Array[String]) {
        val logFile = "C:/spark-1.6.0-bin-hadoop2.6/spark-1.6.0-bin-hadoop2.6/README.md" // Should be some file on your system
        val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]")
        val sc = new SparkContext(conf)
        val logData = sc.textFile(logFile, 2).cache()
        val numAs = logData.filter(line => line.contains("a")).count()
        val numBs = logData.filter(line => line.contains("b")).count()
        println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
      }
    }

16/02/24 01:24:39 INFO SparkContext: Running Spark version 1.6.0
16/02/24 01:24:40 INFO SecurityManager: Changing view acls to: Sivaram Konanki
16/02/24 01:24:40 INFO SecurityManager: Changing modify acls to: Sivaram Konanki
16/02/24 01:24:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Sivaram Konanki); users with modify permissions: Set(Sivaram Konanki)
16/02/24 01:24:41 INFO Utils: Successfully started service 'sparkDriver' on port 54881.
16/02/24 01:24:41 INFO Slf4jLogger: Slf4jLogger started
16/02/24 01:24:42 INFO Remoting: Starting remoting
16/02/24 01:24:42 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.15:54894]
16/02/24 01:24:42 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54894.
16/02/24 01:24:42 INFO SparkEnv: Registering MapOutputTracker
16/02/24 01:24:42 INFO SparkEnv: Registering BlockManagerMaster
16/02/24 01:24:42 INFO DiskBlockManager: Created local directory at C:\Users\Sivaram Konanki\AppData\Local\Temp\blockmgr-dad99e77-f3a6-4a1d-88d8-3b030be0bd0a
16/02/24 01:24:42 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
16/02/24 01:24:42 INFO SparkEnv: Registering OutputCommitCoordinator
16/02/24 01:24:42 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/02/24 01:24:42 INFO SparkUI: Started SparkUI at http://192.168.1.15:4040
16/02/24 01:24:42 INFO Executor: Starting executor ID driver on host localhost
16/02/24 01:24:43 INFO Utils: <b>Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 54913.
16/02/24 01:24:43 INFO NettyBlockTransferService: Server created on 54913
16/02/24 01:24:43 INFO BlockManagerMaster: Trying to register BlockManager
16/02/24 01:24:43 INFO BlockManagerMasterEndpoint: Registering block manager localhost:54913 with 2.4 GB RAM, BlockManagerId(driver, localhost, 54913)
16/02/24 01:24:43 INFO BlockManagerMaster: Registered BlockManager
16/02/24 01:24:44 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 127.4 KB)
16/02/24 01:24:44 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 141.3 KB)
16/02/24 01:24:44 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:54913 (size: 13.9 KB, free: 2.4 GB)
16/02/24 01:24:44 INFO SparkContext: Created broadcast 0 from textFile at SimpleApp.scala:11
16/02/24 01:24:45 WARN : Your hostname, OSG-E5450-42 resolves to a loopback/non-reachable address: fe80:0:0:0:d9ff:4f93:5643:703d%wlan3, but we couldn't find any external IP address!
16/02/24 01:24:46 INFO FileInputFormat: Total input paths to process : 1
16/02/24 01:24:46 INFO SparkContext: Starting job: count at SimpleApp.scala:12
16/02/24 01:24:46 INFO DAGScheduler: Got job 0 (count at SimpleApp.scala:12) with 2 output partitions
16/02/24 01:24:46 INFO DAGScheduler: Final stage: ResultStage 0 (count at SimpleApp.scala:12)
16/02/24 01:24:46 INFO DAGScheduler: Parents of final stage: List()
16/02/24 01:24:46 INFO DAGScheduler: Missing parents: List()
16/02/24 01:24:46 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at filter at SimpleApp.scala:12), which has no missing parents
16/02/24 01:24:46 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 144.5 KB)
16/02/24 01:24:46 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1886.0 B, free 146.3 KB)
16/02/24 01:24:46 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:54913 (size: 1886.0 B, free: 2.4 GB)
16/02/24 01:24:46 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/02/24 01:24:46 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at filter at SimpleApp.scala:12)
16/02/24 01:24:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/02/24 01:24:46 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2172 bytes)
16/02/24 01:24:46 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2172 bytes)
16/02/24 01:24:46 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/02/24 01:24:46 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/02/24 01:24:46 INFO CacheManager: Partition rdd_1_1 not found, computing it
16/02/24 01:24:46 INFO CacheManager: Partition rdd_1_0 not found, computing it
16/02/24 01:24:46 INFO HadoopRDD: Input split: file:/C:/spark-1.6.0-bin-hadoop2.6/spark-1.6.0-bin-hadoop2.6/README.md:1679+1680
16/02/24 01:24:46 INFO HadoopRDD: Input split: file:/C:/spark-1.6.0-bin-hadoop2.6/spark-1.6.0-bin-hadoop2.6/README.md:0+1679
16/02/24 01:24:46 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/02/24 01:24:46 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/02/24 01:24:46 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/02/24 01:24:46 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/02/24 01:24:46 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/02/24 01:24:46 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 4.7 KB, free 151.0 KB)
16/02/24 01:24:46 INFO BlockManagerInfo: Added rdd_1_1 in memory on localhost:54913 (size: 4.7 KB, free: 2.4 GB)
16/02/24 01:24:46 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 5.4 KB, free 156.5 KB)
16/02/24 01:24:46 INFO BlockManagerInfo: Added rdd_1_0 in memory on localhost:54913 (size: 5.4 KB, free: 2.4 GB)
16/02/24 01:24:46 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2662 bytes result sent to driver
16/02/24 01:24:46 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2662 bytes result sent to driver
16/02/24 01:24:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 170 ms on localhost (1/2)
16/02/24 01:24:46 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 143 ms on localhost (2/2)
16/02/24 01:24:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/02/24 01:24:46 INFO DAGScheduler: ResultStage 0 (count at SimpleApp.scala:12) finished in 0.187 s
16/02/24 01:24:46 INFO DAGScheduler: Job 0 finished: count at SimpleApp.scala:12, took 0.303861 s
16/02/24 01:24:46 INFO SparkContext: Starting job: count at SimpleApp.scala:13
16/02/24 01:24:46 INFO DAGScheduler: Got job 1 (count at SimpleApp.scala:13) with 2 output partitions
16/02/24 01:24:46 INFO DAGScheduler: Final stage: ResultStage 1 (count at SimpleApp.scala:13)
16/02/24 01:24:46 INFO DAGScheduler: Parents of final stage: List()
16/02/24 01:24:46 INFO DAGScheduler: Missing parents: List()
16/02/24 01:24:46 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp.scala:13), which has no missing parents
16/02/24 01:24:46 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.1 KB, free 159.6 KB)
16/02/24 01:24:46 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1888.0 B, free 161.5 KB)16/02/24 01:24:46 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:54913 (size: 1888.0 B, free: 2.4 GB)
16/02/24 01:24:46 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
16/02/24 01:24:46 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp.scala:13)
16/02/24 01:24:46 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
16/02/24 01:24:46 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2172 bytes)
16/02/24 01:24:46 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, partition 1,PROCESS_LOCAL, 2172 bytes)
16/02/24 01:24:46 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
16/02/24 01:24:46 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
16/02/24 01:24:46 INFO BlockManager: Found block rdd_1_0 locally
16/02/24 01:24:46 INFO BlockManager: Found block rdd_1_1 locally
16/02/24 01:24:46 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 2082 bytes result sent to driver
16/02/24 01:24:46 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 2082 bytes result sent to driver
16/02/24 01:24:46 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 34 ms on localhost (1/2)
16/02/24 01:24:46 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 37 ms on localhost (2/2)
Lines with a: 58, Lines with b: 26
16/02/24 01:24:46 INFO DAGScheduler: ResultStage 1 (count at SimpleApp.scala:13) finished in 0.040 s
16/02/24 01:24:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/02/24 01:24:46 INFO DAGScheduler: Job 1 finished: count at SimpleApp.scala:13, took 0.068350 s
16/02/24 01:24:46 INFO SparkContext: Invoking stop() from shutdown hook
16/02/24 01:24:46 INFO SparkUI: Stopped Spark web UI at http://192.168.1.15:4040
16/02/24 01:24:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/02/24 01:24:46 INFO MemoryStore: MemoryStore cleared
16/02/24 01:24:46 INFO BlockManager: BlockManager stopped
16/02/24 01:24:46 INFO BlockManagerMaster: BlockManagerMaster stopped
16/02/24 01:24:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/02/24 01:24:46 INFO SparkContext: Successfully stopped SparkContext
16/02/24 01:24:46 INFO ShutdownHookManager: Shutdown hook called
16/02/24 01:24:46 INFO ShutdownHookManager: Deleting directory C:\Users\Sivaram Konanki\AppData\Local\Temp\spark-861b5aef-6732-45e4-a4f4-6769370c555e

Answer 1

这是预期的行为。 Spark UI由SparkContext维护，因此在应用程序完成并且上下文被销毁后它不能处于活动状态。

在独立模式下，集群Web UI保留信息，在Mesos或Yarn上，您可以使用历史记录服务器，但在本地模式下，我所知道的唯一选项是保持应用程序正常运行。

Answer 2

您可以添加

Thread.sleep(1000000);//For 1000 seconds or more

在spark作业的底部，这使您可以在运行Spark Job时在IntelliJ之类的IDE中检查WebUI。

在IntelliJ IDEA中执行代码后，SparkUI停止运行

2 个答案: