Question

我们希望通过Polybase将我们的SQL Server 2016 Enterprise与带有Cloudera 5.14的Kerberized OnPrem Hadoop-Cluster连接起来。

我遵循Microsoft PolyBase Guide来配置Polybase。在该主题上工作了几天后，由于出现异常，我无法继续： javax.security.sasl.SaslException：GSS启动失败[由GSSException引起：没有提供有效的凭据（机制级别：找不到任何有效凭据） Kerberos tgt）]

Microsoft具有troubleshooting the connectivity with PolyBase and Kerberos的内置诊断工具。在Microsoft的此故障排除指南中，有4个检查点，而我受困于检查点4。有关检查点的简短信息（我成功过的地方）：

检查点1：成功！已针对KDC进行身份验证并收到了TGT
检查点2：成功！关于故障排除指南，PolyBase将尝试访问HDFS并失败，因为请求中没有必要的服务凭单。
检查点3：成功！第二次十六进制转储表示SQL Server成功使用了TGT，并从KDC获取了名称节点的SPN的适用服务票证。
检查点4：未成功：Hadoop使用ST（服务凭单）对SQL Server进行了身份验证，并授予了访问安全资源的会话。

krb5.conf文件

[libdefaults]
default_realm = COMPANY.REALM.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
COMPANY.REALM.COM = {
kdc = ipadress.kdc.host
admin_server = ipadress.kdc.host
}
[logging]
default = FILE:/var/log/krb5/kdc.log
kdc = FILE:/var/log/krb5/kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log

core-site.xml用于SQL Server上的Polybase

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
  </property>
  <property>
    <name>ipc.client.connect.max.retries</name>
    <value>2</value>
  </property>
  <property>
    <name>ipc.client.connect.max.retries.on.timeouts</name>
    <value>2</value>
  </property>

<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
    <name>polybase.kerberos.realm</name>
    <value>COMPANY.REALM.COM</value>
  </property>
  <property>
    <name>polybase.kerberos.kdchost</name>
    <value>ipadress.kdc.host</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>KERBEROS</value>
  </property>
</configuration>

hdfs-site.xml用于SQL Server上的Polybase

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>dfs.block.size</name>
    <value>268435456</value> 
  </property>
  <!-- Client side file system caching is disabled below for credential refresh and 
       settting the below cache disabled options to true might result in 
       stale credentials when an alter credential or alter datasource is performed
  -->
  <property>
    <name>fs.wasb.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.wasbs.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.asv.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.asvs.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.hdfs.impl.disable.cache</name>
    <value>true</value>
  </property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
  <property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>hdfs/_HOST@COMPANY.REALM.COM</value> 
  </property>
</configuration>

Polybase异常

[2018-06-22 12:51:50,349] WARN  2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:53,568] WARN  6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:56,127] WARN  8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:676) - Couldn't setup connection for hdfs@COMPANY.REALM.COM to IPADRESS_OF_NAMENODE:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

NameNode上的日志条目

Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]

Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)

让我感到困惑的部分是来自NameNode的日志条目，因为带有HMAC SHA1-96的 AES128 CTS模式已经在允许的编码类型列表中，如krb5.conf和Cloudera Manager UI中所示。

我们感谢您的帮助！

Answer 1

在重新启动集群之后，该问题已得到解决。我认为问题在于，由于某些正在运行的服务，我们的Hadoop-Cluster中的krb5.conf文件无法分发到所有节点上。 Cloudera Manager中还警告有关Kerberos的陈旧配置。非常感谢大家！

GSSException：在将Polybase与Kerberos

krb5.conf文件

core-site.xml用于SQL Server上的Polybase

hdfs-site.xml用于SQL Server上的Polybase

Polybase异常

NameNode上的日志条目

1 个答案: