我在Mesos上使用Spark运行自定义jar时收到UnknownHostException。运行spark-shell时不会发生这个问题。
我的spark-env.sh包含以下内容:
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/
我的spark-defaults.conf包含以下内容:
spark.master mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
spark.mesos.executor.home /spark-1.5.0-bin-hadoop2.6/
这些设置在所有主设备和从设备上。
按如下方式启动spark-shell并运行以下行正常工作:
/spark-1.5.0-bin-hadoop2.6/bin/spark-shell
sc.textFile("/tmp/Input").collect.foreach(println)
记录spark-shell:
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(88528) called with curMem=0, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 86.5 KB, free 530.2 MB)
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(20236) called with curMem=88528, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.8 KB, free 530.2 MB)
15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.21.104:49048 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 0 from textFile at <console>:22
15/09/28 20:04:49 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/28 20:04:49 INFO spark.SparkContext: Starting job: collect at <console>:22
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:22) with 3 output partitions
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Final stage: ResultStage 0(collect at <console>:22)
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Missing parents: List()
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at textFile at <console>:22), which has no missing parents
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(3120) called with curMem=108764, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 530.2 MB)
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(1784) called with curMem=111884, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1784.0 B, free 530.2 MB)
15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.21.104:49048 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at <console>:22)
15/09/28 20:04:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-37-82.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes)
15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-172-31-21-104.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes)
15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, ip-172-31-4-4.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes)
15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-4-4.us-west-2.compute.internal:50648 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S2, ip-172-31-4-4.us-west-2.compute.internal, 50648)
15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-37-82.us-west-2.compute.internal:52624 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S1, ip-172-31-37-82.us-west-2.compute.internal, 52624)
15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-21-104.us-west-2.compute.internal:56628 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S0, ip-172-31-21-104.us-west-2.compute.internal, 56628)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-21-104.us-west-2.compute.internal:56628 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-4-4.us-west-2.compute.internal:50648 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-21-104.us-west-2.compute.internal:56628 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-4-4.us-west-2.compute.internal:50648 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3907 ms on ip-172-31-37-82.us-west-2.compute.internal (1/3)
15/09/28 20:04:53 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 3884 ms on ip-172-31-4-4.us-west-2.compute.internal (2/3)
15/09/28 20:04:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 3907 ms on ip-172-31-21-104.us-west-2.compute.internal (3/3)
15/09/28 20:04:53 INFO scheduler.DAGScheduler: ResultStage 0 (collect at <console>:22) finished in 3.940 s
15/09/28 20:04:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/09/28 20:04:53 INFO scheduler.DAGScheduler: Job 0 finished: collect at <console>:22, took 4.019454 s
pepsi
cocacola
以下编译到Jar中的示例代码失败
示例代码:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
sc.textFile("/tmp/Input").collect.foreach(println)
}
}
通过:
运行/spark-1.5.0-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" /home/hdfs/test_2.10-0.1.jar
记录spark-submit:
java.lang.IllegalArgumentException: java.net.UnknownHostException: affinio
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: affinio
... 35 more
HDFS-site.xml中
<property>
<name>dfs.nameservices</name>
<value>affinio</value>
</property>
<property>
<name>dfs.ha.namenodes.affinio</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.affinio.nn1</name>
<value>172.31.16.81:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.affinio.nn2</name>
<value>172.31.32.81:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.affinio.nn1</name>
<value>172.31.16.81:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.affinio.nn2</name>
<value>172.31.32.81:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>file:///nfs/dfs/ha-name-dir-shared</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.affinio</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/namenode</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hdfs</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.31.16.81:2181,172.31.32.81:2181,172.31.0.81:2181</value>
</property>
</configuration>
芯-site.xml中
<property>
<name>fs.defaultFS</name>
<value>hdfs://affinio</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
spark-shell conf.toDebugString
spark.app.id=20150929-173220-1361059756-5050-16026-0005
spark.app.name=Spark shell
spark.driver.host=172.31.25.67
spark.driver.port=37613
spark.executor.id=driver
spark.externalBlockStore.folderName=spark-d4bf255f-f1f3-4026-83bf-b377a24f5f2c
spark.fileserver.uri=http://172.31.25.67:54526
spark.jars=
spark.master=mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
spark.mesos.executor.home=/spark-1.5.0-bin-hadoop2.6/
spark.repl.class.uri=http://172.31.25.67:45553
spark.submit.deployMode=client
spark-submit conf.toDebugString
spark.app.id=20150929-173220-1361059756-5050-16026-0004
spark.app.name=Simple Application
spark.driver.host=172.31.25.67
spark.driver.port=47968
spark.executor.id=driver
spark.externalBlockStore.folderName=spark-846de0d9-8bb1-414b-8b81-f2d6646a58d3
spark.fileserver.uri=http://172.31.25.67:45283
spark.jars=file:/home/hdfs/./test_2.10-0.1.jar
spark.master=mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
spark.mesos.executor.home=/spark-1.5.0-bin-hadoop2.6/
spark.submit.deployMode=client
如果我按如下方式运行它,我能够使它工作:
spark-submit --files /hadoop-2.7.1/etc/hadoop/hdfs-site.xml,/hadoop-2.7.1/etc/hadoop/core-site.xml ./test_2.10-0.1.jar
因此,默认情况下不会加载配置,即使我在/spark-1.5.0-bin-hadoop2.6/conf中将所有计算机上的HADOOP_CONF_DIR设置为/hadoop-2.7.1/etc/hadoop/ /spark-env.sh以及用户配置文件设置:
cat /etc/profile.d/hadoop.sh
# Set path for hadoop
export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/
export PATH=$PATH:/hadoop-2.7.1/bin
--verbose开关的输出
System properties:
spark.local.dir -> /data/spark/
SPARK_SUBMIT -> true
spark.files -> file:///hadoop-2.7.1/etc/hadoop/hdfs-site.xml,file:///hadoop-2.7.1/etc/hadoop/core-site.xml
spark.app.name -> SimpleApp
spark.jars -> file:/home/hdfs/./test_2.10-0.1.jar
spark.submit.deployMode -> client
spark.mesos.executor.home -> /spark-1.5.0-bin-hadoop2.6
spark.master -> mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
Classpath elements:
file:/home/hdfs/./test_2.10-0.1.jar
我还让应用程序从执行程序
打印环境变量sc.parallelize(Array(1)).flatMap( v=>System.getenv ).collect.foreach(v=>println(s"${v._1}=${v._2}"))
输出:
LIBPROCESS_PORT=0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
SPARK_EXECUTOR_MEMORY=1024m
SHLVL=1
MESOS_EXECUTOR_ID=20150930-115952-1361059756-5050-15990-S1
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
MESOS_DIRECTORY=/data/slaves/20150930-115952-1361059756-5050-15990-S1/frameworks/20150930-115952-1361059756-5050-15990-0008/executors/20150930-115952-1361059756-5050-15990-S1/runs/2baa786a-be89-4823-a248-bb35034bb2fa
MESOS_SLAVE_PID=slave(1)@172.31.32.118:5051
_SPARK_ASSEMBLY=/spark-1.5.0-bin-hadoop2.6/lib/spark-assembly-1.5.0-hadoop2.6.0.jar
SPARK_HOME=/spark-1.5.0-bin-hadoop2.6
MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-0.24.0.so
SPARK_SCALA_VERSION=2.10
SPARK_USER=hdfs
PWD=/data/slaves/20150930-115952-1361059756-5050-15990-S1/frameworks/20150930-115952-1361059756-5050-15990-0008/executors/20150930-115952-1361059756-5050-15990-S1/runs/2baa786a-be89-4823-a248-bb35034bb2fa
SPARK_ENV_LOADED=1
MESOS_FRAMEWORK_ID=20150930-115952-1361059756-5050-15990-0008
MESOS_SLAVE_ID=20150930-115952-1361059756-5050-15990-S1
MESOS_CHECKPOINT=0
HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/
SPARK_EXECUTOR_OPTS=
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
所以我们可以看到执行者在他们的环境中有HADOOP_CONF_DIR,但是如果没有使用spark.files它仍然无法正常工作
更新:
降级为spark-1.3.1,问题就消失了。 spark-1.5中的某些东西打破了类路径
Spark-1.3.1输出:
System properties:
SPARK_SUBMIT -> true
spark.app.name -> SimpleApp
spark.jars -> file:/home/hdfs/./test_2.10-0.1.jar
spark.mesos.executor.home -> /spark-1.3.1-bin-hadoop2.6
spark.master -> mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
Classpath elements:
file:/home/hdfs/./test_2.10-0.1.jar
执行者环境:
LIBPROCESS_PORT=0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
SPARK_EXECUTOR_MEMORY=512m
SHLVL=1
MESOS_EXECUTOR_ID=20150930-115952-1361059756-5050-15990-S2
CLASSPATH=/spark-1.3.1-bin-hadoop2.6/conf:/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/spark-1.3.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/spark-1.3.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/spark-1.3.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/hadoop-2.7.1/etc/hadoop
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
MESOS_DIRECTORY=/data/slaves/20150930-115952-1361059756-5050-15990-S2/frameworks/20150930-115952-1361059756-5050-15990-0013/executors/20150930-115952-1361059756-5050-15990-S2/runs/23c38710-14d7-4550-b3f7-2879576ce1d2
MESOS_SLAVE_PID=slave(1)@172.31.18.189:5051
PYTHONPATH=/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip:/spark-1.3.1-bin-hadoop2.6/python:
SPARK_HOME=/spark-1.3.1-bin-hadoop2.6
SPARK_CONF_DIR=/spark-1.3.1-bin-hadoop2.6/conf
MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-0.24.0.so
SPARK_SCALA_VERSION=2.10
SPARK_USER=hdfs
PWD=/data/slaves/20150930-115952-1361059756-5050-15990-S2/frameworks/20150930-115952-1361059756-5050-15990-0013/executors/20150930-115952-1361059756-5050-15990-S2/runs/23c38710-14d7-4550-b3f7-2879576ce1d2
SPARK_ENV_LOADED=1
MESOS_FRAMEWORK_ID=20150930-115952-1361059756-5050-15990-0013
MESOS_SLAVE_ID=20150930-115952-1361059756-5050-15990-S2
MESOS_CHECKPOINT=0
HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop
SPARK_EXECUTOR_OPTS=
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat