https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python似乎已过时。
当我将其添加到/ etc / profile:
时export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py
然后我可以执行链接中列出的导入,但from hive import ThriftHive
实际需要的除外:
from hive_service import ThriftHive
接下来示例中的端口是10000,当我尝试时导致程序挂起。默认的Hive Thrift端口是9083,它停止了悬挂。
所以我这样设置:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
transport = TSocket.TSocket('<node-with-metastore>', 9083)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute("CREATE TABLE test(c1 int)")
transport.close()
except Thrift.TException, tx:
print '%s' % (tx.message)
我收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute
self.recv_execute()
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'execute'
但检查ThriftHive.py文件会显示该方法在Client类中执行。
我如何使用Python访问Hive?
答案 0 :(得分:39)
我认为最简单的方法是使用PyHive。
要安装,您需要这些库:
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
请注意,虽然您将库安装为PyHive
,但您将模块导入pyhive
,全部为小写。
如果您使用的是Linux,则可能需要在运行上述内容之前单独安装SASL。使用apt-get或yum或您的发行版的任何软件包管理器安装软件包libsasl2-dev。对于Windows,GNU.org上有一些选项,您可以下载二进制安装程序。在Mac上,如果您已安装xcode开发人员工具(终端中为xcode-select --install
),则应该可以使用SASL
安装完成后,您可以像这样连接到Hive:
from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
现在您已经拥有了hive连接,您可以选择如何使用它。您可以直接查询:
cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
use_result(result)
...或使用连接制作Pandas数据帧:
import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
答案 1 :(得分:24)
我断言您正在使用HiveServer2,这是导致代码无效的原因。
您可以使用pyhs2正确访问您的Hive,并使用示例代码:
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
请注意,在使用pip安装pyhs2之前,可以安装python-devel.x86_64 cyrus-sasl-devel.x86_64。
希望这可以帮到你。
答案 2 :(得分:13)
下面的python程序应该可以从python访问hive表:
import commands
cmd = "hive -S -e 'SELECT * FROM db_name.table_name LIMIT 1;' "
status, output = commands.getstatusoutput(cmd)
if status == 0:
print output
else:
print "error"
答案 3 :(得分:6)
你可以使用hive库,因为你想从hive import ThriftHive导入hive类
试试这个例子:
import sys
from hive import ThriftHive
from hive.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
transport = TSocket.TSocket('localhost', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
client.execute("SELECT * FROM r")
while (1):
row = client.fetchOne()
if (row == None):
break
print row
client.execute("SELECT * FROM r")
print client.fetchAll()
transport.close()
except Thrift.TException, tx:
print '%s' % (tx.message)
答案 4 :(得分:5)
要使用用户名/密码连接并指定端口,代码如下所示:
from pyhive import presto
cursor = presto.connect(host='host.example.com',
port=8081,
username='USERNAME:PASSWORD').cursor()
sql = 'select * from table limit 10'
cursor.execute(sql)
print(cursor.fetchone())
print(cursor.fetchall())
答案 5 :(得分:4)
上面的例子有点过时了。 这里有一个新的例子:
import pyhs2 as hive
import getpass
DEFAULT_DB = 'default'
DEFAULT_SERVER = '10.37.40.1'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'PAM01-PRD01.IBM.COM'
u = raw_input('Enter PAM username: ')
s = getpass.getpass()
connection = hive.connect(host=DEFAULT_SERVER, port= DEFAULT_PORT, authMechanism='LDAP', user=u + '@' + DEFAULT_DOMAIN, password=s)
statement = "select * from user_yuti.Temp_CredCard where pir_post_dt = '2014-05-01' limit 100"
cur = connection.cursor()
cur.execute(statement)
df = cur.fetchall()
除了标准的python程序之外,还需要安装一些库以允许Python构建与Hadoop数据库的连接。
1.Pyhs2,Python Hive Server 2客户端驱动程序
2.Sasl,Python的Cyrus-SASL绑定
3.Thrift,Apache Thrift RPC系统的Python绑定
4.PyHive,Hive的Python接口
请记住更改可执行文件的权限
chmod + x test_hive2.py ./test_hive2.py
希望它对你有所帮助。 参考:https://sites.google.com/site/tingyusz/home/blogs/hiveinpython答案 6 :(得分:3)
与eycheu的解决方案类似,但更为详细。
以下是专门针对hive2 的替代解决方案,不需要PyHive或安装系统范围的软件包。我正在开发一个我没有root访问权限的linux环境,因此安装Tristin帖子中提到的SASL依赖项对我来说不是一个选项:
如果您使用的是Linux,则可能需要在运行上述内容之前单独安装SASL。使用apt-get或yum或您的发行版的任何软件包管理器安装软件包libsasl2-dev。
具体来说,这个解决方案专注于利用python包:JayDeBeApi。根据我的经验,在python Anaconda 2.7安装之上安装这个额外的软件包就是我所需要的。这个包利用了java(JDK)。我假设已经设置好了。
第1步:安装JayDeBeApi
pip install jaydebeap
第2步:为您的环境下载适当的驱动程序:
将所有.jar文件存储在目录中。我将此目录称为/ path / to / jar / files /.
第3步:确定您的系统身份验证机制:
在列出的pyhive解决方案中,我看到PLAIN被列为身份验证机制以及Kerberos。 请注意,您的jdbc连接URL将取决于您使用的身份验证机制。我将在不传递用户名/密码的情况下解释 Kerberos解决方案。 Here is more information Kerberos authentication and options.
创建Kerberos票证(如果尚未创建)
$ kinit
可以通过klist
查看门票。
您现在已准备好通过python进行连接:
import jaydebeapi
import glob
# Creates a list of jar files in the /path/to/jar/files/ directory
jar_files = glob.glob('/path/to/jar/files/*.jar')
host='localhost'
port='10000'
database='default'
# note: your driver will depend on your environment and drivers you've
# downloaded in step 2
# this is the driver for my environment (jdbc3, hive2, cloudera enterprise)
driver='com.cloudera.hive.jdbc3.HS2Driver'
conn_hive = jaydebeapi.connect(driver,
'jdbc:hive2://'+host+':' +port+'/'+database+';AuthMech=1;KrbHostFQDN='+host+';KrbServiceName=hive'
,jars=jar_files)
如果您只关心阅读,那么您可以通过eycheu的解决方案直接将其直接读入熊猫的数据框:
import pandas as pd
df = pd.read_sql("select * from table", conn_hive)
否则,这是一个更通用的通信选项:
cursor = conn_hive.cursor()
sql_expression = "select * from table"
cursor.execute(sql_expression)
results = cursor.fetchall()
你可以想象,如果你想创建一个表,你就不需要&#34; fetch&#34;结果,但可以提交创建表查询。
答案 7 :(得分:3)
通常的做法是禁止用户在群集节点上下载和安装软件包和库。在这种情况下,如果hive在同一节点上运行,则@ python-starter和@goks的解决方案可以完美地工作。否则,可以使用beeline
代替hive
命令行工具。参见details
#python 2
import commands
cmd = 'beeline -u "jdbc:hive2://node07.foo.bar:10000/...<your connect string>" -e "SELECT * FROM db_name.table_name LIMIT 1;"'
status, output = commands.getstatusoutput(cmd)
if status == 0:
print output
else:
print "error"
。
#python 3
import subprocess
cmd = 'beeline -u "jdbc:hive2://node07.foo.bar:10000/...<your connect string>" -e "SELECT * FROM db_name.table_name LIMIT 1;"'
status, output = subprocess.getstatusoutput(cmd)
if status == 0:
print(output)
else:
print("error")
答案 8 :(得分:3)
类似于@ python-starter解决方案。但是,命令包在python3.x上是不可用的。所以替代解决方案是在python3.x中使用子进程
import subprocess
cmd = "hive -S -e 'SELECT * FROM db_name.table_name LIMIT 1;' "
status, output = subprocess.getstatusoutput(cmd)
if status == 0:
print(output)
else:
print("error")
答案 9 :(得分:2)
这可以快速破解连接hive和python,
letter = sys.argv[2] if len(sys.argv) >= 3 else 'a'
输出:元组列表
答案 10 :(得分:2)
不再维护pyhs2。更好的选择是impyla
不要混淆以下关于Impala的一些上述例子;只需将 HiveServer2 的端口更改为10000(默认),它的工作方式与Impala示例相同。它与Impala和Hive都使用相同的协议(Thrift)。
https://github.com/cloudera/impyla
它具有比pyhs2更多的功能,例如,它具有Kerberos身份验证,这对我们来说是必须的。
from impala.dbapi import connect
conn = connect(host='my.host.com', port=10000)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
##
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
Cloudera现在正在为hs2客户端https://github.com/cloudera/hs2client付出更多努力 这是一个C / C ++ HiveServer2 / Impala客户端。如果你向python推送大量数据,那么可能是更好的选择。 (也有Python绑定 - https://github.com/cloudera/hs2client/tree/master/python)
有关impyla的更多信息:
答案 11 :(得分:2)
这是一种通用方法,对我来说很容易,因为我一直通过python连接到多个服务器(SQL,Teradata,Hive等)。因此,我使用pyodbc连接器。这是开始使用pyodbc的一些基本步骤(以防您从未使用过):
完成后:
步骤1. pip安装:
pip install pyodbc
(here's the link to download the relevant driver from Microsoft's website)
第2步,现在,将其导入您的python脚本中:
import pyodbc
第3步。最后,继续并提供如下的连接详细信息:
conn_hive = pyodbc.connect('DSN = YOUR_DSN_NAME , SERVER = YOUR_SERVER_NAME, UID = USER_ID, PWD = PSWD' )
使用pyodbc最好的部分是我只需导入一个软件包即可连接到几乎所有数据源。
答案 12 :(得分:1)
您可以使用python JayDeBeApi包从Hive或Impala JDBC驱动程序创建DB-API连接,然后将连接传递给pandas.read_sql函数以返回pandas数据帧中的数据。
import jaydebeapi
# Apparently need to load the jar files for the first time for impala jdbc driver to work
conn = jaydebeapi.connect('com.cloudera.hive.jdbc41.HS2Driver',
['jdbc:hive2://host:10000/db;AuthMech=1;KrbHostFQDN=xxx.com;KrbServiceName=hive;KrbRealm=xxx.COM', "",""],
jars=['/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/HiveJDBC41.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/TCLIServiceClient.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-codec-1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_metastore.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_service.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/log4j-1.2.14.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/ql.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/zookeeper-3.4.6.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ImpalaJDBC41.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/TCLIServiceClient.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-codec-1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_metastore.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_service.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/log4j-1.2.14.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ql.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/zookeeper-3.4.6.jar'
])
# the previous call have initialized the jar files, technically this call needs not include the required jar files
impala_conn = jaydebeapi.connect('com.cloudera.impala.jdbc41.Driver',
['jdbc:impala://host:21050/db;AuthMech=1;KrbHostFQDN=xxx.com;KrbServiceName=impala;KrbRealm=xxx.COM',"",""],
jars=['/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/HiveJDBC41.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/TCLIServiceClient.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-codec-1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_metastore.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/hive_service.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/log4j-1.2.14.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/ql.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/hive_jdbc_2.5.18.1050/2.5.18.1050 GA/Cloudera_HiveJDBC41_2.5.18.1050/zookeeper-3.4.6.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ImpalaJDBC41.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/TCLIServiceClient.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-codec-1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/commons-logging-1.1.1.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_metastore.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/hive_service.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpclient-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/httpcore-4.1.3.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libfb303-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/libthrift-0.9.0.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/log4j-1.2.14.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/ql.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-api-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/slf4j-log4j12-1.5.11.jar',
'/hadp/opt/jdbc/impala_jdbc_2.5.35/2.5.35.1055 GA/Cloudera_ImpalaJDBC41_2.5.35/zookeeper-3.4.6.jar'
])
import pandas as pd
df1 = pd.read_sql("SELECT * FROM tablename", conn)
df2 = pd.read_sql("SELECT * FROM tablename", impala_conn)
conn.close()
impala_conn.close()
答案 13 :(得分:1)
最简单的方法是使用 PyHive。
要安装,您需要这些库:
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
安装后,您可以像这样连接到 Hive:
from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
现在您有了 hive 连接,您可以选择如何使用它。您可以直接查询:
cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
use_result(result)
...或使用连接制作 Pandas 数据框:
import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
答案 14 :(得分:0)
我已经与您解决了相同的问题,这是我的操作环境( 系统:Linux 版本:python 3.6 包装:Pyhive) 请参考我的回答如下:
from pyhive import hive
conn = hive.Connection(host='149.129.***.**', port=10000, username='*', database='*',password="*",auth='LDAP')
关键是添加参考密码和auth,同时将auth设置为'LDAP'。这样就可以了,任何问题请让我知道
答案 15 :(得分:0)
通过使用Python客户端驱动程序
pip install pyhs2
然后
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
答案 16 :(得分:0)
所有答案均未演示如何获取和打印表头。修改了PyHive中的标准示例,该示例得到了广泛使用和积极维护。
from pyhive import hive
cursor = hive.connect(host="localhost",
port=10000,
username="shadan",
auth="KERBEROS",
kerberos_service_name="hive"
).cursor()
cursor.execute("SELECT * FROM my_dummy_table LIMIT 10")
columnList = [desc[0] for desc in cursor.description]
headerStr = ",".join(columnList)
headerTuple = tuple(headerStr.split (",")
print(headerTuple)
print(cursor.fetchone())
print(cursor.fetchall())
答案 17 :(得分:0)
要求:
代码:
import pandas as pd
from sqlalchemy import create_engine
SECRET = {'username':'lol', 'password': 'lol'}
user_name = SECRET.get('username')
passwd = SECRET.get('password')
host_server = 'x.x.x.x'
port = '10000'
database = 'default'
conn = f'hive://{user_name}:{passwd}@{host_server}:{port}/{database}'
engine = create_engine(conn, connect_args={'auth': 'LDAP'})
query = "select * from tablename limit 100"
data = pd.read_sql(query, con=engine)
print(data)