使用Thrift在Python脚本中运行Hive-Query时“连接被拒绝”

时间:2012-12-17 22:16:47

标签: python hadoop mapreduce hive thrift

所有

我正在尝试使用Thrift库for Python在python脚本中运行hive查询。 我可以运行不执行M / R的查询,如create tableselect * from table等。 但是当我执行执行M / R作业的查询(如select * from table where...)时,我得到以下异常。

starting hive server...

Hive history file=/tmp/root/hive_job_log_root_201212171354_275968533.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
java.net.ConnectException: Call to sp-rhel6-01/172.22.193.79:54311 failed on connection exception: java.net.ConnectException: Connection refused

Job Submission failed with exception 'java.net.ConnectException(Call to sp-rhel6-01/172.22.193.79:54311 failed on connection exception: java.net.ConnectException: Connection refused)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

我有一个多节点hadoop集群,我的hive安装在namenode中,我也在同一个namenode上运行python脚本。

python脚本是

from hive_service import ThriftHive
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

transport = TSocket.TSocket('172.22.193.79', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = ThriftHive.Client(protocol)
transport.open()

client.execute("select count(*) from example ")
print client.fetchAll();
transport.close()

任何人都可以帮我理解错误吗?

-Sushant

1 个答案:

答案 0 :(得分:0)

我在完成SELECT次查询时遇到问题,但我可以完成SHOWDESCRIBE次查询。我修复此问题的方法是重新启动群集上的服务。 我正在使用Cloudera来管理我的集群,所以我运行的命令是$ sudo /etc/init.d/cloudera-scm-agent hard_restart。我没有花太多时间调试,但我猜测NN或​​JT崩溃了。有趣的是,我仍然可以完成对元数据的查询。我最好的猜测是,查询直接进入了Metastore,并且不必触及HDFS。我需要有人确认一下。