使用python

时间:2016-06-02 07:27:10

标签: python pandas hadoop hive pyhive

我用python连接hive&将数据检索到pandas中,但它给出了一个错误:

pyhive.exc.OperationalError: TExecuteStatementResp

我的代码:

# -*- coding: utf-8 -*-
from pyhive import hive
from impala.util import as_pandas
from string import Template

config = {
    'host': '127.0.0.1',
    'database': 'default'
}

def get_conn(conf):
    conn = hive.connect(**conf)
    return conn

def execute_hql(hql, params = None):
    conn = get_conn(config)
    cursor = conn.cursor()
    hql = Template(hql).substitute(params)
    cursor.execute(hql)
    df = as_pandas(cursor)
    return df

test.py

# -*- coding: utf-8 -*-
from pyhive import hive
from impala.util import as_pandas
import DB.hive_engines

hql = """
    SELECT
        keywords,
        count(keywords)
    FROM
        table
    WHERE
        eventname = 'xxx' AND
        cdate >= '$start_date' AND
        cdate <= '$end_date'
    GROUP BY
        keywords
"""

if __name__ == '__main__':
    params = {'start_date': '2016-04-01', 'end_date': '2016-04-03'}
    df = DB.hive_engines.execute_hql(hql, params)
    print df

异常消息:

  

pyhive.exc.OperationalError:TExecuteStatementResp(status = TStatus(errorCode = 1,errorMessage ='处理语句时出错:FAILED:执行错误,从org.apache.hadoop.hive.ql.exec.mr返回代码1)。 MapRedTask',sqlState ='08S01',infoMessages = ['* org.apache.hive.service.cli.HiveSQLException:处理语句时出错:FAILED:执行错误,从org.apache.hadoop.hive.ql返回代码1。 exec.mr.MapRedTask:28:27','org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:326','org.apache.hive.service.cli.operation.SQLOperation: runQuery:SQLOperation.java:146','org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:173','org.apache.hive.service.cli.operation.Operation:run: Operation.java:268','org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:410','org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl。 java:391','sun.reflect.GeneratedMethodAccesso r31:invoke :: - 1','sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43','java.lang.reflect.Method:invoke:Method.java:606','org.apache.hive。 service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78','org.apache.hive.service.cli.session.HiveSessionProxy:access $ 000:HiveSessionProxy.java:36','org.apache.hive.service .cli.session.HiveSessionProxy $ 1:run:HiveSessionProxy.java:63','java.security.AccessController:doPrivileged:AccessController.java:-2','javax.security.auth.Subject:doAs:Subject.java:415 ','org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1671','org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59','com.sun .proxy。$ Proxy27:executeStatement :: - 1','org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:245','org.apache.hive.service.cli.thrift.ThriftCLIService: ExecuteStatement:ThriftCLIService.java:509','org.apache.hive.service.cli.thrift.TCLIService $ Proce ssor $ ExecuteStatement:getResult:TCLIService.java:1313','org.apache.hive.service.cli.thrift.TCLIService $ Processor $ ExecuteStatement:getResult:TCLIService.java:1298','org.apache.thrift.ProcessFunction: process:ProcessFunction.java:39','org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39','org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',' org.apache.thrift.server.TThreadPoolServer $ WorkerProcess:run:TThreadPoolServer.java:285','java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145','java.util.concurrent.ThreadPoolExecutor $ Worker: run:ThreadPoolExecutor.java:615','java.lang.Thread:run:Thread.java:745'],statusCode = 3),operationHandle = None)

谢谢!

2 个答案:

答案 0 :(得分:0)

更新hql数据

hql = """
SELECT
    keywords,
    count(keywords)
FROM
    table
WHERE
    eventname = 'xxx' AND
    cdate >= '$start_date' AND
    cdate <= '$end_date'
GROUP BY
    keywords
LIMIT 100
"""

添加限制运行成功

可以提供思路吗?

全部谢谢

答案 1 :(得分:0)

this讨论之后,我在创建连接时使用了有效的用户名,从而解决了该问题。

为了这个答案的完整性,我要复制上述论坛中建议的代码。请注意此处的有效用户名。

from pyhive import hive
conn = hive.Connection(host='<myhost>',
            port='<myport>',
            database='spin1',
            username='<a valid user>')  # IMPORTANT**
cursor = conn.cursor()
print cursor.fetchall() 

由于缺少有效的用户名,我遇到了问题中提到的相同异常。