我尝试将spark1.5.2更新为spark2.0.0,在两台机器上测试(node3,node7),我通过spark2.0.0 / spark-submit提交任务,但任务将在spark 1.5.2中运行< /强>
我在node3提交任务时遇到错误
~/software/spark-2.0.0-bin-hadoop2.6/bin$ spark-submit --master mesos://192.168.1.5050 ../examples/src/main/python/pimy.py
mesos executores stderr登录node7
sh: 1: /home/jianxun/software/spark-1.5.2-bin-hadoop2.6/bin/spark-class: not found
node3:JDK:
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~15.10.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
node3 / etc / profile
export M2_HOME=/usr/share/maven
export M2=$M2_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export PATH=/home/jianxun/software/mongodb-linux-x86_64-3.2.0/bin:$PATH
export HIVE_HOME=/home/jianxun/software/apache-hive-2.0.1-bin
export PATH=$HIVE_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:/usr/share/java/mysql.jar
export SPARK_HOME=/home/jianxun/software/spark-2.0.0-bin-hadoop2.6
node7:JDK:
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~15.10.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
node3 / etc / profile
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export SPARK_HOME=/home/jianxun/software/spark-2.0.0-bin-hadoop2.6
export PYTHONPATH=/usr/lib/python2.7
Mesos verson是0.25, Mesos master是node3,只有一个Mesos slave是node7。 node3有两个版本的spark:
node3中的spark配置:
spark-env.sh
export MESOS_NATIVE_JAVA_LIBRARY=/home/jianxun/software/mesos/lib/libmesos-0.25.0.so
export SCALA_HOME=/usr/share/scala-2.11
export SPARK_EXCUTOR_URI=/home/jianxun/software/spark-2.0.0-bin-hadoop2.6.tgz
火花defaults.conf
spark.local.dir /data/sparktmp
spark.shuffle.service.enabled true
spark.mesos.coarse true
spark.executor.memory 24g
spark.executor.cores 7
spark.cores.max 7
spark.executor.uri /home/jianxun/software/spark-2.0.0-bin-hadoop2.6.tgz
node7只有新版本的spark:
spark-submit log :(重要的部分由****计算)
*********************************************************
*********************************************************
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/jianxun/software/spark-1.5.2-bin-hadoop2.6/lib/spark-examples-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/jianxun/software/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/03 12:31:33 INFO SparkContext: Running Spark version 1.5.2
*****************************************************************
*****************************************************************
16/08/03 12:31:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/03 12:31:34 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
16/08/03 12:31:34 INFO SecurityManager: Changing view acls to: jianxun
16/08/03 12:31:34 INFO SecurityManager: Changing modify acls to: jianxun
16/08/03 12:31:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jianxun); users with modify permissions: Set(jianxun)
16/08/03 12:31:34 INFO Slf4jLogger: Slf4jLogger started
16/08/03 12:31:34 INFO Remoting: Starting remoting
16/08/03 12:31:34 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.203:40978]
16/08/03 12:31:34 INFO Utils: Successfully started service 'sparkDriver' on port 40978.
16/08/03 12:31:34 INFO SparkEnv: Registering MapOutputTracker
16/08/03 12:31:34 INFO SparkEnv: Registering BlockManagerMaster
16/08/03 12:31:34 INFO DiskBlockManager: Created local directory at /data/sparktmp/blockmgr-76944d0c-de18-4f52-9249-8c3ca6141f59
16/08/03 12:31:34 INFO MemoryStore: MemoryStore started with capacity 12.4 GB
16/08/03 12:31:34 INFO HttpFileServer: HTTP File server directory is /data/sparktmp/spark-eba79d72-dd11-4d5d-a008-9964522fcc24/httpd-a64948d7-9e78-42f0-b711-84fc5f040517
16/08/03 12:31:34 INFO HttpServer: Starting HTTP Server
16/08/03 12:31:35 INFO Utils: Successfully started service 'HTTP file server' on port 35616.
16/08/03 12:31:35 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/03 12:31:35 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/03 12:31:35 INFO SparkUI: Started SparkUI at http://192.168.1.203:4040
16/08/03 12:31:35 INFO Utils: Copying /home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py to /data/sparktmp/spark-eba79d72-dd11-4d5d-a008-9964522fcc24/userFiles-03a46142-7a44-43d0-82de-10c174721a99/pimy.py
16/08/03 12:31:35 INFO SparkContext: Added file file:/home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py at http://192.168.1.203:35616/files/pimy.py with timestamp 1470198695252
16/08/03 12:31:35 WARN SparkContext: Using SPARK_MEM to set amount of memory to use per executor process is deprecated, please use spark.executor.memory instead.
16/08/03 12:31:35 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
I0803 12:31:35.419636 32575 sched.cpp:164] Version: 0.25.0
I0803 12:31:35.430359 32570 sched.cpp:262] New master detected at master@192.168.1.203:5050
I0803 12:31:35.431447 32570 sched.cpp:272] No credentials provided. Attempting to register without authentication
I0803 12:31:35.434844 32570 sched.cpp:641] Framework registered with ff2cf87e-3712-413f-a452-6d71430527bc-0012
16/08/03 12:31:35 INFO MesosSchedulerBackend: Registered as framework ID ff2cf87e-3712-413f-a452-6d71430527bc-0012
16/08/03 12:31:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41218.
16/08/03 12:31:35 INFO NettyBlockTransferService: Server created on 41218
16/08/03 12:31:35 INFO BlockManagerMaster: Trying to register BlockManager
16/08/03 12:31:35 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.203:41218 with 12.4 GB RAM, BlockManagerId(driver, 192.168.1.203, 41218)
16/08/03 12:31:35 INFO BlockManagerMaster: Registered BlockManager
16/08/03 12:31:36 INFO SparkContext: Starting job: reduce at /home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py:38
16/08/03 12:31:36 INFO DAGScheduler: Got job 0 (reduce at /home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py:38) with 2 output partitions
16/08/03 12:31:36 INFO DAGScheduler: Final stage: ResultStage 0(reduce at /home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py:38)
16/08/03 12:31:36 INFO DAGScheduler: Parents of final stage: List()
16/08/03 12:31:36 INFO DAGScheduler: Missing parents: List()
16/08/03 12:31:36 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py:38), which has no missing parents
16/08/03 12:31:36 INFO MemoryStore: ensureFreeSpace(4272) called with curMem=0, maxMem=13335873454
16/08/03 12:31:36 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.2 KB, free 12.4 GB)
16/08/03 12:31:36 INFO MemoryStore: ensureFreeSpace(2792) called with curMem=4272, maxMem=13335873454
....
....
16/08/03 12:31:37 INFO DAGScheduler: Job 0 failed: reduce at /home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py:38, took 1.002633 s
Traceback (most recent call last):
File "/home/jianxun/software/spark-2.0.0-bin-hadoop2.6/./examples/src/main/python/pimy.py", line 38, in <module>
count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
File "/home/jianxun/software/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 799, in reduce
File "/home/jianxun/software/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 773, in collect
File "/home/jianxun/software/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/home/jianxun/software/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError16/08/03 12:31:37 INFO DAGScheduler: Executor lost: ff2cf87e-3712-413f-a452-6d71430527bc-S4 (epoch 3)
: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, node7): ExecutorLostFailure (executor ff2cf87e-3712-413f-a452-6d71430527bc-S4lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
16/08/03 12:31:37 INFO BlockManagerMasterEndpoint: Trying to remove executor ff2cf87e-3712-413f-a452-6d71430527bc-S4 from BlockManagerMaster.
16/08/03 12:31:37 INFO BlockManagerMaster: Removed ff2cf87e-3712-413f-a452-6d71430527bc-S4 successfully in removeExecutor
16/08/03 12:31:37 INFO DAGScheduler: Host added was in lost list earlier: node7
16/08/03 12:31:37 INFO SparkContext: Invoking stop() from shutdown hook
16/08/03 12:31:37 INFO SparkUI: Stopped Spark web UI at http://192.168.1.203:4040
16/08/03 12:31:37 INFO DAGScheduler: Stopping DAGScheduler
I0803 12:31:37.146209 32592 sched.cpp:1771] Asked to stop the driver
I0803 12:31:37.146414 32573 sched.cpp:1040] Stopping framework 'ff2cf87e-3712-413f-a452-6d71430527bc-0012'
16/08/03 12:31:37 INFO MesosSchedulerBackend: driver.run() returned with code DRIVER_STOPPED
16/08/03 12:31:37 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/03 12:31:37 INFO MemoryStore: MemoryStore cleared
16/08/03 12:31:37 INFO BlockManager: BlockManager stopped
16/08/03 12:31:37 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/03 12:31:37 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/03 12:31:37 INFO SparkContext: Successfully stopped SparkContext
16/08/03 12:31:37 INFO ShutdownHookManager: Shutdown hook called
16/08/03 12:31:37 INFO ShutdownHookManager: Deleting directory /data/sparktmp/spark-eba79d72-dd11-4d5d-a008-9964522fcc24/pyspark-02048aa7-deaf-4af5-adde-86732cd44324
16/08/03 12:31:37 INFO ShutdownHookManager: Deleting directory /data/sparktmp/spark-eba79d72-dd11-4d5d-a008-9964522fcc24
mesos.Warning登录node7
Log file created at: 2016/08/03 12:31:36
Running on machine: node7
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0803 12:31:36.408701 5686 containerizer.cpp:988] Ignoring update for unknown container: 9910a15a-ec96-4e5a-91b9-58652b2bcaa5
W0803 12:31:36.409050 5686 containerizer.cpp:988] Ignoring update for unknown container: 9910a15a-ec96-4e5a-91b9-58652b2bcaa5
W0803 12:31:36.613108 5687 containerizer.cpp:988] Ignoring update for unknown container: 108436bb-429b-4214-9d9b-9fa452383093
W0803 12:31:36.613817 5691 containerizer.cpp:988] Ignoring update for unknown container: 108436bb-429b-4214-9d9b-9fa452383093
W0803 12:31:36.807909 5692 containerizer.cpp:988] Ignoring update for unknown container: 5c9abbdb-ee6a-4175-8087-d6d1dd1bd5ea
W0803 12:31:36.808281 5692 containerizer.cpp:988] Ignoring update for unknown container: 5c9abbdb-ee6a-4175-8087-d6d1dd1bd5ea
W0803 12:31:37.019579 5687 containerizer.cpp:988] Ignoring update for unknown container: 7a11174e-7774-453c-bdf7-5cbb5b4afcfa
W0803 12:31:37.020051 5693 containerizer.cpp:988] Ignoring update for unknown container: 7a11174e-7774-453c-bdf7-5cbb5b4afcfa
W0803 12:31:37.142438 5690 slave.cpp:1995] Cannot shut down unknown framework ff2cf87e-3712-413f-a452-6d71430527bc-0012
答案 0 :(得分:0)
执行/ etc / profile文件。