我们正尝试使用 MIT KDC 和 Ranger 在跨领域身份验证启用的两个群集之间进行数据传输。< / p>
DistCP
正在运行,没有任何问题。但是,集群A中的Spark应用程序应该将数据写入集群B HDFS(kerberised)不起作用。
应用程序在LOCAL模式下工作,对集群B的HDFS的写入工作。但是当我们尝试在YARN-CLUSTER模式下运行时,它会失败
AcessControlException (Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]).
在调试时,观察到创建的FileSystem
对象在纱线群集模式下具有 SIMPLE身份验证,并且pricipal只是用户名,而LOCAL模式下的KERBEROS身份验证和主体是适当的校长。
我们知道Yarn委托令牌启动Executors和Driver。我们不确定我们是否缺少spark或hdfs或yarn中的任何配置。
由于它是Cross-Realm KDS,我使用cluser A的Principal和Keytab来提交spark应用程序。
在我们在spark submit中启用的一些属性之下。
SPARK:
spark.yarn.access.namenodes=hdfs://mycluster02
spark.authenticate=true
spark.yarn.access.hadoopFileSystems=hdfs://mycluster02
spark.yarn.principal=username@DOMAIN.COM
spark.yarn.keytab=user.keytab
纱:
hadoop.registry.client.auth=kerberos
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled=false
和所有其他纱线kerberos原则作为keytabs也设置。
HDFS:
hadoop.security.authentication=kerberos
and all basic configuraion on kerberos enabling.
下面是从在executor上运行的应用程序复制的相同代码,用于创建文件System对象。
Configuration conf = new Configuration();
conf.addResource(new Path(args[0] + "/core-site.xml"));
conf.addResource(new Path(args[0] + "/hdfs-site.xml")); list and do distcp
conf.set("hadoop.security.authentication", "kerberos");
FileSystem fs = FileSystem.get(conf);
FileStatus[] fsStatus = fs.listStatus(new Path("/"));
spark-submit --name "HDFS_APP_DATA" --master yarn-cluster --conf "spark.yarn.access.namenodes=hdfs://mycluster02" --conf "spark.authenticate=true" --conf "spark.yarn.access.hadoopFileSystems=hdfs://mycluster02" --conf "spark.yarn.principal=user@EXAMPLE.COM" --conf "spark.yarn.keytab=/home/user/hdfs_test/user.princ.keytab" --driver-memory 2g --executor-memory 3g --num-executors 1 --executor-cores 1 --class com.test.spark.kafka.batch.util.HDFSApp spark-batch-util-jar-with-dependencies.jar /config_files/
例外:
18/05/23 16:15:38 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hostname.org, partition 1,PROCESS_LOCAL, 2092 bytes)
18/05/23 16:15:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hostname.org): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "hostname.org/xx.xx.xx.xx"; destination host is: "hostname.org":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:785)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy12.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:625)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy13.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2126)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:919)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992)
at com.example.spark.kafka.batch.util.HDFSApp$1.call(HDFSApp.java:51)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$34.apply(RDD.scala:919)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$34.apply(RDD.scala:919)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:720)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770)
at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 34 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:758)
... 37 more