Question

我在群集模式下使用Apache Spark，主节点和3个从属设备（所有这4台机器在同一个局域网中是不同的Vm＆＃39;）。配置成功后，我可以在spark web ui中看到我的工作节点和主节点

我正在使用Python 2.7和spark 1.4.1

但真正的问题是当我试图用master运行spark（在这种情况下我使用的是PySpark），它会在python控制台中不断记录错误。我能够捕获日志，但没有从这些日志中找到任何线索。

我在这里粘贴我的日志以供参考

__getattribute__

我试图从下面的位置找到奴隶的日志：的/ usr /本地/火花/工作/

ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077 bin/pyspark
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/03/29 09:16:48 INFO SparkContext: Running Spark version 1.4.1
16/03/29 09:16:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/29 09:16:49 INFO SecurityManager: Changing view acls to: ubuntu
16/03/29 09:16:49 INFO SecurityManager: Changing modify acls to: ubuntu
16/03/29 09:16:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/03/29 09:16:49 INFO Slf4jLogger: Slf4jLogger started
16/03/29 09:16:50 INFO Remoting: Starting remoting
16/03/29 09:16:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.81:34901]
16/03/29 09:16:50 INFO Utils: Successfully started service 'sparkDriver' on port 34901.
16/03/29 09:16:50 INFO SparkEnv: Registering MapOutputTracker
16/03/29 09:16:50 INFO SparkEnv: Registering BlockManagerMaster
16/03/29 09:16:50 INFO DiskBlockManager: Created local directory at /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/blockmgr-a9e868af-4253-4230-9227-948fbb8a0d91
16/03/29 09:16:50 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/03/29 09:16:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/httpd-a78e633c-0ae7-46cf-81e8-776d8f7c3c46
16/03/29 09:16:50 INFO HttpServer: Starting HTTP Server
16/03/29 09:16:50 INFO Utils: Successfully started service 'HTTP file server' on port 34364.
16/03/29 09:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/29 09:16:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/29 09:16:50 INFO SparkUI: Started SparkUI at http://173.220.132.82:4040
16/03/29 09:16:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@192.168.1.81:7077/user/Master...
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160329091651-0006
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/0 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/0 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/1 on worker-20160329072744-192.168.1.82-45482 (192.168.1.82:45482) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/1 on hostPort 192.168.1.82:45482 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/2 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/2 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now RUNNING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now RUNNING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now RUNNING
16/03/29 09:16:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42458.
16/03/29 09:16:51 INFO NettyBlockTransferService: Server created on 42458
16/03/29 09:16:51 INFO BlockManagerMaster: Trying to register BlockManager
16/03/29 09:16:51 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.81:42458 with 265.4 MB RAM, BlockManagerId(driver, 192.168.1.81, 42458)
16/03/29 09:16:51 INFO BlockManagerMaster: Registered BlockManager
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.4.1
      /_/

Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkContext available as sc, HiveContext available as sqlContext.
>>> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now EXITED (Command exited with code 1)
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/0 removed: Command exited with code 1
16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/3 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/3 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now EXITED (Command exited with code 1)
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/2 removed: Command exited with code 1
16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/4 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/4 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now LOADING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now LOADING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now RUNNING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now RUNNING
16/03/29 09:16:54 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now EXITED (Command exited with code 1)
16/03/29 09:16:54 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/1 removed: Command exited with code 1
16/03/29 09:16:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1

所以任何快速建议都会帮助我。

Answer 1

我终于能够解决这个问题了。这是因为一些ip配置规则而发生的。

我能够通过运行以下命令来解决这个问题，以允许特定的端口 sudo iptables -I INPUT 1 -p tcp --dport 5678 -j ACCEPT

当Spark在随机端口上运行时，我们可以将所有端口列入白名单 sudo iptables -I INPUT -j ACCEPT

谢谢，帕（http://pavanarya.wordpress.com）

Answer 2

我在Spark群集中遇到了完全相同的问题。

当您在spark上运行非常密集的应用程序时，通常会发生此问题，并在此之后尝试运行其他应用程序。

事实上，其中一个执行者在从上一次运行中密集运行后变得没有响应，但是火花一直在戳它。不知道为什么spark会在Standalone和Cluster模式下继续这样做，因为这不会在纱线模式下发生。理想情况下，即使一个执行程序/工作程序失败，Spark也具有健壮性以保持运行作业。

要解决此问题，您需要重新启动无响应的工作程序和执行程序。您的应用将开始运行。如果您无法找到哪个执行程序，只需重新启动所有执行程序即可。它会解决这个问题。

与master一起运行时的Apache spark连续抛出错误

2 个答案: