我在群集模式下使用Apache Spark,主节点和3个从属设备(所有这4台机器在同一个局域网中是不同的Vm')。 配置成功后,我可以在spark web ui中看到我的工作节点和主节点
我正在使用Python 2.7和spark 1.4.1
但真正的问题是当我试图用master运行spark(在这种情况下我使用的是PySpark),它会在python控制台中不断记录错误。 我能够捕获日志,但没有从这些日志中找到任何线索。
我在这里粘贴我的日志以供参考
__getattribute__
我试图从下面的位置找到奴隶的日志: 的/ usr /本地/火花/工作/
ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077 bin/pyspark
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/03/29 09:16:48 INFO SparkContext: Running Spark version 1.4.1
16/03/29 09:16:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/29 09:16:49 INFO SecurityManager: Changing view acls to: ubuntu
16/03/29 09:16:49 INFO SecurityManager: Changing modify acls to: ubuntu
16/03/29 09:16:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/03/29 09:16:49 INFO Slf4jLogger: Slf4jLogger started
16/03/29 09:16:50 INFO Remoting: Starting remoting
16/03/29 09:16:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.81:34901]
16/03/29 09:16:50 INFO Utils: Successfully started service 'sparkDriver' on port 34901.
16/03/29 09:16:50 INFO SparkEnv: Registering MapOutputTracker
16/03/29 09:16:50 INFO SparkEnv: Registering BlockManagerMaster
16/03/29 09:16:50 INFO DiskBlockManager: Created local directory at /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/blockmgr-a9e868af-4253-4230-9227-948fbb8a0d91
16/03/29 09:16:50 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/03/29 09:16:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/httpd-a78e633c-0ae7-46cf-81e8-776d8f7c3c46
16/03/29 09:16:50 INFO HttpServer: Starting HTTP Server
16/03/29 09:16:50 INFO Utils: Successfully started service 'HTTP file server' on port 34364.
16/03/29 09:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/29 09:16:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/29 09:16:50 INFO SparkUI: Started SparkUI at http://173.220.132.82:4040
16/03/29 09:16:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@192.168.1.81:7077/user/Master...
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160329091651-0006
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/0 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/0 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/1 on worker-20160329072744-192.168.1.82-45482 (192.168.1.82:45482) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/1 on hostPort 192.168.1.82:45482 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/2 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/2 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now RUNNING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now RUNNING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now RUNNING
16/03/29 09:16:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42458.
16/03/29 09:16:51 INFO NettyBlockTransferService: Server created on 42458
16/03/29 09:16:51 INFO BlockManagerMaster: Trying to register BlockManager
16/03/29 09:16:51 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.81:42458 with 265.4 MB RAM, BlockManagerId(driver, 192.168.1.81, 42458)
16/03/29 09:16:51 INFO BlockManagerMaster: Registered BlockManager
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkContext available as sc, HiveContext available as sqlContext.
>>> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now EXITED (Command exited with code 1)
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/0 removed: Command exited with code 1
16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/3 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/3 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now EXITED (Command exited with code 1)
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/2 removed: Command exited with code 1
16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/4 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/4 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now LOADING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now LOADING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now RUNNING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now RUNNING
16/03/29 09:16:54 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now EXITED (Command exited with code 1)
16/03/29 09:16:54 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/1 removed: Command exited with code 1
16/03/29 09:16:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
所以任何快速建议都会帮助我。
答案 0 :(得分:1)
我终于能够解决这个问题了。这是因为一些ip配置规则而发生的。
我能够通过运行以下命令来解决这个问题,以允许特定的端口 sudo iptables -I INPUT 1 -p tcp --dport 5678 -j ACCEPT
当Spark在随机端口上运行时,我们可以将所有端口列入白名单 sudo iptables -I INPUT -j ACCEPT
答案 1 :(得分:0)
我在Spark群集中遇到了完全相同的问题。
当您在spark上运行非常密集的应用程序时,通常会发生此问题,并在此之后尝试运行其他应用程序。
事实上,其中一个执行者在从上一次运行中密集运行后变得没有响应,但是火花一直在戳它。不知道为什么spark会在Standalone和Cluster模式下继续这样做,因为这不会在纱线模式下发生。理想情况下,即使一个执行程序/工作程序失败,Spark也具有健壮性以保持运行作业。要解决此问题,您需要重新启动无响应的工作程序和执行程序。您的应用将开始运行。 如果您无法找到哪个执行程序,只需重新启动所有执行程序即可。它会解决这个问题。