我用python连接hive&将数据检索到pandas中,但它给出了一个错误:
pyhive.exc.OperationalError: TExecuteStatementResp
我的代码:
# -*- coding: utf-8 -*-
from pyhive import hive
from impala.util import as_pandas
from string import Template
config = {
'host': '127.0.0.1',
'database': 'default'
}
def get_conn(conf):
conn = hive.connect(**conf)
return conn
def execute_hql(hql, params = None):
conn = get_conn(config)
cursor = conn.cursor()
hql = Template(hql).substitute(params)
cursor.execute(hql)
df = as_pandas(cursor)
return df
test.py
# -*- coding: utf-8 -*-
from pyhive import hive
from impala.util import as_pandas
import DB.hive_engines
hql = """
SELECT
keywords,
count(keywords)
FROM
table
WHERE
eventname = 'xxx' AND
cdate >= '$start_date' AND
cdate <= '$end_date'
GROUP BY
keywords
"""
if __name__ == '__main__':
params = {'start_date': '2016-04-01', 'end_date': '2016-04-03'}
df = DB.hive_engines.execute_hql(hql, params)
print df
异常消息:
pyhive.exc.OperationalError:TExecuteStatementResp(status = TStatus(errorCode = 1,errorMessage ='处理语句时出错:FAILED:执行错误,从org.apache.hadoop.hive.ql.exec.mr返回代码1)。 MapRedTask',sqlState ='08S01',infoMessages = ['* org.apache.hive.service.cli.HiveSQLException:处理语句时出错:FAILED:执行错误,从org.apache.hadoop.hive.ql返回代码1。 exec.mr.MapRedTask:28:27','org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:326','org.apache.hive.service.cli.operation.SQLOperation: runQuery:SQLOperation.java:146','org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:173','org.apache.hive.service.cli.operation.Operation:run: Operation.java:268','org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:410','org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl。 java:391','sun.reflect.GeneratedMethodAccesso r31:invoke :: - 1','sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43','java.lang.reflect.Method:invoke:Method.java:606','org.apache.hive。 service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78','org.apache.hive.service.cli.session.HiveSessionProxy:access $ 000:HiveSessionProxy.java:36','org.apache.hive.service .cli.session.HiveSessionProxy $ 1:run:HiveSessionProxy.java:63','java.security.AccessController:doPrivileged:AccessController.java:-2','javax.security.auth.Subject:doAs:Subject.java:415 ','org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1671','org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59','com.sun .proxy。$ Proxy27:executeStatement :: - 1','org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:245','org.apache.hive.service.cli.thrift.ThriftCLIService: ExecuteStatement:ThriftCLIService.java:509','org.apache.hive.service.cli.thrift.TCLIService $ Proce ssor $ ExecuteStatement:getResult:TCLIService.java:1313','org.apache.hive.service.cli.thrift.TCLIService $ Processor $ ExecuteStatement:getResult:TCLIService.java:1298','org.apache.thrift.ProcessFunction: process:ProcessFunction.java:39','org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39','org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',' org.apache.thrift.server.TThreadPoolServer $ WorkerProcess:run:TThreadPoolServer.java:285','java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145','java.util.concurrent.ThreadPoolExecutor $ Worker: run:ThreadPoolExecutor.java:615','java.lang.Thread:run:Thread.java:745'],statusCode = 3),operationHandle = None)
谢谢!
答案 0 :(得分:0)
更新hql数据
hql = """
SELECT
keywords,
count(keywords)
FROM
table
WHERE
eventname = 'xxx' AND
cdate >= '$start_date' AND
cdate <= '$end_date'
GROUP BY
keywords
LIMIT 100
"""
添加限制运行成功
可以提供思路吗?
全部谢谢
答案 1 :(得分:0)
在this讨论之后,我在创建连接时使用了有效的用户名,从而解决了该问题。
为了这个答案的完整性,我要复制上述论坛中建议的代码。请注意此处的有效用户名。
from pyhive import hive
conn = hive.Connection(host='<myhost>',
port='<myport>',
database='spin1',
username='<a valid user>') # IMPORTANT**
cursor = conn.cursor()
print cursor.fetchall()
由于缺少有效的用户名,我遇到了问题中提到的相同异常。