我在Stand-a-lone模式下运行spark,版本1.4.1。我有10个节点集群,与Cassandra位于同一地点。我有一个从单个Cassandra表中读取的作业,执行2个映射,然后使用elasticsearch提供的elasticsearch-hadoop库将RDD写入Elasticsearch集群。具体来说,工作看起来像。我不认为这对问题很重要,但为了完整......
val sc = new SparkContext(....)
val rdd = sc.cassandraTable("keyspace", "tablename")
.select("col1", "col2")
.map(c => caseclass(c.columnvalues(0), c.columnvalues(1))
.map(cc => Map(... construct map for es ... )
EsSpark.saveToEs(rdd, "/index/type")
工作开始时运行正常,但几分钟后很快开始减速。工人开始在LOADING
州开始花费大量时间。并且所有工作日志看起来都类似于
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/15 18:55:08 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/08/15 18:55:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/15 18:55:09 INFO SecurityManager: Changing view acls to: spark
16/08/15 18:55:09 INFO SecurityManager: Changing modify acls to: spark
16/08/15 18:55:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
16/08/15 18:55:09 INFO Slf4jLogger: Slf4jLogger started
16/08/15 18:55:09 INFO Remoting: Starting remoting
16/08/15 18:55:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@10.0.5.33:56472]
16/08/15 18:55:10 INFO Utils: Successfully started service 'driverPropsFetcher' on port 56472.
16/08/15 18:57:10 ERROR UserGroupInformation: PriviledgedActionException as:spark (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1504)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
... 4 more
16/08/15 18:57:10 INFO Utils: Shutdown hook called
所有工作人员最终都显示EXITED
状态,所有任务显示为成功。他们只是慢下来绝对爬行。当我查看所有群集上的资源使用情况时,他们甚至都没有接近最大化。
非常感谢您对此问题的任何帮助。