PySpark:从节点无法访问hdfs。 Kerberos

时间:2018-07-18 01:27:32

标签: apache-spark hadoop pyspark hdfs kerberos

我有一个HDFS群集。纱线也被配置。我可以从命令行等访问HDFS。...在资源管理器上,我启动了pyspark主节点,在一个数据节点上,我启动了pyspark从节点(指向主节点)。 Kerberos正在对HDFS群集进行身份验证。

我已经生成了一个个人密钥表文件。

当我使用提交工作时  ../bin/spark-submit --master yarn-部署模式客户端--keytab ../bin/MyUsername.keytab --principal MyUsername @ DOMAIN ../../ word_count_spark.py

似乎从节点无法使用Kerberos进行身份验证以从Hadoop群集(HDFS)检索文件。我以前已经读过,主节点通过从属节点Kerberos凭据供其使用,但是...这是我得到的错误,

任何线索都将不胜感激!

  

2018-07-18 12:51:20 WARN TaskSetManager:66-阶段丢失了任务0.0   0.0(TID 0,130.195.4.131,执行程序0):java.io.IOException:失败于本地异常:java.io.IOException:   org.apache.hadoop.security.AccessControlException:客户端无法   通过以下方式进行身份验证:[TOKEN,KERBEROS];主机详细信息:本地主机为:   “ SLAVE-HOSTNAME / 130.19X.X.13X”;目标主机是:   “ MASTER-HOSTNAME”:9000;           在org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)           在org.apache.hadoop.ipc.Client.call(Client.java:1479)           在org.apache.hadoop.ipc.Client.call(Client.java:1412)           在org.apache.hadoop.ipc.ProtobufRpcEngine $ Invoker.invoke(ProtobufRpcEngine.java:229)           在com.sun.proxy。$ Proxy15.getBlockLocations(未知来源)           在org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)           在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处           在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)           在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           在java.lang.reflect.Method.invoke(Method.java:498)           在org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)           在org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)           在com.sun.proxy。$ Proxy16.getBlockLocations(未知来源)           在org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)           在org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)           在org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)           在org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)           在org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)           在org.apache.hadoop.hdfs.DFSInputStream。(DFSInputStream.java:264)           在org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)           在org.apache.hadoop.hdfs.DistributedFileSystem $ 3.doCall(DistributedFileSystem.java:304)           在org.apache.hadoop.hdfs.DistributedFileSystem $ 3.doCall(DistributedFileSystem.java:299)           在org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)           在org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)           在org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)           在org.apache.hadoop.mapred.LineRecordReader。(LineRecordReader.java:109)           在org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)           在org.apache.spark.rdd.HadoopRDD $$ anon $ 1.liftedTree1 $ 1(HadoopRDD.scala:257)           在org.apache.spark.rdd.HadoopRDD $$ anon $ 1。(HadoopRDD.scala:256)           在org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214)           在org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)           在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)           在org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:64)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)           在org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:99)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)           在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)           在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)           在org.apache.spark.scheduler.Task.run(Task.scala:109)           在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)           在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)           在java.lang.Thread.run(Thread.java:748)导致原因:java.io.IOException:   org.apache.hadoop.security.AccessControlException:客户端无法   通过以下方式进行身份验证:[TOKEN,KERBEROS]           在org.apache.hadoop.ipc.Client $ Connection $ 1.run(Client.java:687)           在java.security.AccessController.doPrivileged(本机方法)           在javax.security.auth.Subject.doAs(Subject.java:422)

0 个答案:

没有答案