我的脚本是用python编写的,它在没有docker环境的DSE 4.8上运行良好。现在我升级到DSE 5.0.4并在docker环境中运行它,现在我收到了以下RPC错误。在我使用DSE Spark 1.4.1之前,我现在使用的是1.6.2。
主机OS Centos 7.2和Docker OS是一样的。我们使用spark来提交任务,我们尝试给执行者2G,4G,6G和8G,他们都给出了相同的错误信息。
相同的python脚本在我之前的环境中运行没有问题,但现在我更新它并没有正常工作。
对于scala操作,代码在当前环境中正常运行,只有python部分有问题。重置主机仍然无法解决问题。重新创建docker容器也无助于解决问题。
编辑:
也许我的Mapreduce功能太复杂了。问题可能在这里,但不确定。
环境规格: 集群由6个主机组成,每个主机有16个核心CPU,32G内存,500G SSD。
知道如何解决这个问题吗?此错误消息的含义是什么?非常感谢!如果您需要更多信息,请告诉我。
错误日志:
Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
WARN 2017-02-26 10:14:08,314 org.apache.spark.scheduler.TaskSetManager: Lost task 47.1 in stage 88.0 (TID 9705, 139.196.190.79): TaskKilled (killed intentionally)
Traceback (most recent call last):
File "/data/user_profile/User_profile_step1_classify_articles_common_sc_collect.py", line 1116, in <module>
compute_each_dimension_and_format_user(article_by_top_all_tmp)
File "/data/user_profile/User_profile_step1_classify_articles_common_sc_collect.py", line 752, in compute_each_dimension_and_format_user
sqlContext.createDataFrame(article_up_save_rdd, df_schema).write.format('org.apache.spark.sql.cassandra').options(keyspace='archive', table='articles_up_update').save(mode='append')
File "/opt/dse-5.0.4/resources/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 395, in save
WARN 2017-02-26 10:14:08,336 org.apache.spark.scheduler.TaskSetManager: Lost task 63.1 in stage 88.0 (TID 9704, 139.196.190.79): TaskKilled (killed intentionally)
File "/opt/dse-5.0.4/resources/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/dse-5.0.4/resources/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/opt/dse-5.0.4/resources/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o795.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 619 in stage 88.0 failed 4 times, most recent failure: Lost task 619.3 in stage 88.0 (TID 9746, 139.196.107.73): ExecutorLostFailure (executor 59 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$han
Docker命令:
docker run -d --net=host -i --privileged \
-e SEEDS=10.XX.XXx.XX 1,10.XX.XXx.XXX \
-e CLUSTER_NAME="MyCluster" \
-e LISTEN_ADDRESS=10.XX.XXx.XX \
-e BROADCAST_RPC_ADDRESS=139.XXX.XXX.XXX \
-e RPC_ADDRESS=0.0.0.0 \
-e STOMP_INTERFACE=10.XX.XXx.XX \
-e HOSTS=139.XX.XXx.XX \
-v /data/dse/lib/cassandra:/var/lib/cassandra \
-v /data/dse/lib/spark:/var/lib/spark \
-v /data/dse/log/cassandra:/var/log/cassandra \
-v /data/dse/log/spark:/var/log/spark \
-v /data/agent/log:/opt/datastax-agent/log \
--name dse_container registry..xxx.com/rechao/dse:5.0.4 -s
答案 0 :(得分:0)
docker很好,将主机内存增加到64G可以解决这个问题。