我正在尝试使用Python连接到蜂巢。我安装了所需的所有依赖项(sasl,thrift_sasl等)。
这是我尝试连接的方式:
configuration = {"hive.server2.authentication.kerberos.principal" : "hive/_HOST@REALM_HOST", "hive.server2.authentication.kerberos.keytab" : "/etc/security/keytabs/hive.service.keytab"}
connection = hive.Connection(configuration = configuration, host="host", port=port, auth="KERBEROS", kerberos_service_name = "hiveserver2")
但是我得到这个错误:
次要代码可能会提供更多信息(找不到领域“ REALM_DOMAIN”的KDC)
我想念的是谁?有人有使用pyHive
的{{1}}连接的示例吗?
谢谢您的帮助。
答案 0 :(得分:0)
我在pyspark中不知道,但是我正在使用以下scala代码,并且自去年起就可以使用了。如果可以在python中更改此代码。根据您的kerberos替换属性的值。
System.setProperty("hive.metastore.uris", "add hive.metastore.uris url");
System.setProperty("hive.metastore.sasl.enabled", "true")
System.setProperty("hive.metastore.kerberos.keytab.file", "add keytab")
System.setProperty("hive.security.authorization.enabled", "false")
System.setProperty("hive.metastore.kerberos.principal", "replace hive.metastore.kerberos.principal value")
System.setProperty("hive.metastore.execute.setugi", "true")
val hiveContext = new HiveContext(sparkContext)
答案 1 :(得分:0)
谢谢@Kishore。 实际上,在PySpark中,代码如下所示:
<div class="rolloverloaded hidden”>
<img class="" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-4-360x259.jpg" style="z-index: 0;">
<img class="" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-2-360x259.jpg" style="z-index: 0;">
<img class="" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-1-360x259.jpg" style="z-index: 0;">
<img class="" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-6-360x259.jpg" style="z-index: 0;">
<img class="fadeIn" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-7-360x259.jpg" style="z-index: 7;">
<img class="fadeIn" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-3-360x259.jpg" style="z-index: 8;">
<img class="fadeIn" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-8-360x259.jpg" style="z-index: 9;">
<img class="fadeIn" id="dynamic" src="/wp-content/uploads/2018/06/blue-moon-creative-hamilton-self-storage-10-360x259.gif" style="z-index: 10;">
</div>
,您可以使用:
import pyspark
from pyspark import SparkContext
from pyspark.sql import Row
from pyspark import SparkConf
from pyspark.sql import HiveContext
from pyspark.sql import functions as F
import pyspark.sql.types as T
def connection(self):
conf = pyspark.SparkConf()
conf.setMaster('yarn-client')
sc = pyspark.SparkContext(conf=conf)
self.cursor = HiveContext(sc)
self.cursor.setConf("hive.exec.dynamic.partition", "true")
self.cursor.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
self.cursor.setConf("hive.warehouse.subdir.inherit.perms", "true")
self.cursor.setConf('spark.scheduler.mode', 'FAIR')
我实际上是通过命令运行代码的:
rows = self.cursor.sql("SELECT someone FROM something")
for row in rows.collect():
print row
我想您可以使用运行pyspark的python来运行
spark-submit --master yarn MyProgram.py
但是我没有尝试过,所以我不能保证它能正常工作