只是想澄清一下 spark-submit --keytab --principal&& --proxy-user参数可以共存吗?
我们要求以真正的业务用户身份提交作业,但用户在hadoop kdc中没有委托人。
每当使用proxy-user和kerberos将它们放在一起时,我就会收到异常。
17/02/09 13:51:43 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 379 for atlas on 10.12.118.92:8020
Exception in thread "main" java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:888)
at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:8
at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2243)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1293)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.take(RDD.scala:1288)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1328)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.first(RDD.scala:1327)
at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:269)
at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:265)
at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:242)
at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:74)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:171)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:44)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at org.sandbox.Main$.main(Main.scala:39)
at org.sandbox.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:161)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:870)
... 57 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 403, message: Forbidde
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274)
at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214)
答案 0 :(得分:3)
制作spark-submit --keytab时, - 主要&amp;&amp; --proxy-user参数不能一起使用。
如果一起使用,则会通过以下错误提交:
Spark提交失败: 错误:只能提供--proxy-user或--principal中的一个。
答案 1 :(得分:0)
1)同时--proxy-user
和--principal
can't be passed together to spark-submit
。但是,您可以初始化为kerberos用户并在proxy-user下启动spark-job:
kinit -kt USER.keytab USER && spark-submit --proxy-user PROXY-USER
*如果您使用带有配置单元的spark +确保正确配置hadoop.proxyuser.USER.{hosts,groups}
,则无效。
答案 2 :(得分:0)
我可以使用spark submit一起使用--proxy-user, - primcipal和--keytab。上述问题是由于对KMS Ranger的DELEGATIONTOKEN请求权限。
所以我在“自定义kms网站”中添加了以下条目,以使其有效。
hadoop.kms.proxyuser.xxx.users=*
hadoop.kms.proxyuser.xxx.hosts=*