通过python从远程服务器访问Hive

时间:2015-09-23 15:49:52

标签: python-2.7 hadoop hive

我在远程服务器上安装了以下必需的软件包,以通过python访问Hive。

Python 2.7.6, Python开发工具, pyhs2, SASL-0.1.3, 节俭0.9.1, PyHive-0.1.0

这是访问配置单元的python脚本。

#!/usr/bin/env python
import pyhs2 as hive
import getpass
DEFAULT_DB = 'camp'
DEFAULT_SERVER = '10.25.xx.xx'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'xxx.xxxxxx.com'

# Get the username and password
u = raw_input('Enter PAM username: ')
s = getpass.getpass()
# Build the Hive Connection
connection = hive.connect(host=DEFAULT_SERVER, port=DEFAULT_PORT,    authMechanism='LDAP', user=u + '@' + DEFAULT_DOMAIN, password=s)
# Hive query statement
statement = "select * from camp.test"
cur = connection.cursor()

# Runs a Hive query and returns the result as a list of list
cur.execute(statement)
df = cur.fetchall()

这是我得到的输出:

文件“build / bdist.linux-x86_64 / egg / pyhs2 / init .py”,第7行,在连接中   在 init 中输入文件“build / bdist.linux-x86_64 / egg / pyhs2 / connections.py”,第46行   文件“build / bdist.linux-x86_64 / egg / pyhs2 / cloudera / thrift_sasl.py”,第74行,打开   在_recv_sasl_message中输入文件“build / bdist.linux-x86_64 / egg / pyhs2 / cloudera / thrift_sasl.py”,第92行   在readAll中输入文件“build / bdist.linux-x86_64 / egg / thrift / transport / TTransport.py”,第58行   文件“build / bdist.linux-x86_64 / egg / thrift / transport / TSocket.py”,第118行,正在阅读 thrift.transport.TTransport.TTransportException:TSocket读取0个字节

我在执行脚本后没有看到输出中的任何错误,但我在屏幕上看不到任何查询结果。我不确定为什么它没有显示任何查询结果,Hive服务器ip,端口,用户和密码都是正确的。此外,我还验证了蜂巢服务器和远程服务器之间的连接,没有连接问题。

如果您有任何建议或解决方案,请提供帮助。谢谢你的帮助。

2 个答案:

答案 0 :(得分:0)

尝试使用此代码:

import pyhs2

with pyhs2.connect(host='localhost',
                   port=10000,
                   authMechanism="PLAIN",
                   user='root',
                   password='test',
                   database='default') as conn:
    with conn.cursor() as cur:
        #Show databases
        print cur.getDatabases()

        #Execute query
        cur.execute("select * from table")

        #Return column info from query
        print cur.getSchema()

        #Fetch table results
        for i in cur.fetch():
            print i

答案 1 :(得分:0)

我已成功使用以下

进行访问
from pyhive import presto
DEFAULT_DB = 'XXXXX'
DEFAULT_SERVER = 'server.name.blah'
DEFAULT_PORT = 8000

# Username
u = "user"

# Build the Hive Connection
connection = presto.connect(host=DEFAULT_SERVER, port=DEFAULT_PORT, username=u)

# Hive query statement
statement = "select * from public.dudebro limit 5"
cur = connection.cursor()

# Runs a Hive query and returns the result as a list of list
cur.execute(statement)
df = cur.fetchall()
print df