使用Python连接到Hive2时使用以下代码:
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
我收到以下错误:
File
"C:\Users\vinbhask\AppData\Roaming\Python\Python36\site-packages\pyhs2-0.6.0-py3.6.egg\pyhs2\connections.py",
line 7, in <module>
from cloudera.thrift_sasl import TSaslClientTransport ModuleNotFoundError: No module named 'cloudera'
这是迄今为止安装的软件包:
bitarray0.8.1,certifi2017.7.27.1,chardet3.0.4,cm-api16.0.0,cx-Oracle6.0.1,future0.16.0,idna2.6,impyla0.14.0,JayDeBeApi1.1.1,JPype10.6.2,ply3.10,pure-sasl0.4.0,PyHive0.4.0,pyhs20.6.0,pyodbc4.0.17,requests2.18.4,sasl0.2.1,six1.10.0,teradata15.10.0.21,thrift0.10.0,thrift-sasl0.2.1,thriftpy0.3.9,urllib31.22
使用Impyla时出错:
Traceback (most recent call last):
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\Scripts\HiveConnTester4.py", line 1, in <module>
from impala.dbapi import connect
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\dbapi.py", line 28, in <module>
import impala.hiveserver2 as hs2
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\hiveserver2.py", line 33, in <module>
from impala._thrift_api import (
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\_thrift_api.py", line 74, in <module>
include_dirs=[thrift_dir])
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\__init__.py", line 30, in load
include_dir=include_dir)
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\parser.py", line 496, in parse
url_scheme))
thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c'
答案 0 :(得分:1)
thrift_sasl.py正在尝试cStringIO,它在Python 3.0中不再可用。试试python 2?
答案 1 :(得分:1)
您可能需要安装未发布的thrift_sasl版本。尝试:
pip install git+https://github.com/cloudera/thrift_sasl
答案 2 :(得分:0)
如果您对学习PySpark感到满意,那么您只需要设置hive.metastore.uris
属性以指向Hive Metastore地址,然后就可以开始了。
最简单的方法是从群集中导出hive-site.xml
,然后在--files hive-site.xml
期间传递spark-submit
。
(我没有尝试过运行独立的Pyspark,所以YMMV)