Question

让我们说table_1住在database_1。

import pyodbc
connection =  pyodbc.connect(dsn='hive', autocommit=True)
cursor = connection.cursor()
cursor.execute("USE database_1")
cursor.execute("SELECT * FROM table_1")

这将给出一个表未找到错误，因为我们在光标执行下一个查询时已将我们使用的数据库重置为默认值。有没有办法将一致的状态/捆绑多个查询保存到执行语句中以避免这种情况？我特别感兴趣的是能够设置映射器/缩减器的数量，并且能够在执行下一个查询时保持这种状态。我知道另一种方法是使用Python来使shell连接到Hive并执行一个hql文件，但我宁愿不这样做。

Answer 1

我建议你尝试一些事情：

我认为在大多数情况下，如果不是全部，您可以使用连接字符串指定要使用的数据库。
我在the documentation中看到＆＃39;执行＆＃39;命令返回光标本身，即使我会尝试：

cursor.execute（＆＃34; USE database_1＆＃34;）。执行（＆＃34; SELECT * FROM table_1＆＃34;）

（以防万一文件错误）

这实际上可行：

cursor.execute（＆＃34; USE database_1＆＃34;）

cursor.commit（）

cursor.execute（＆＃34; SELECT * FROM table_1＆＃34;）

如果有效，请及时更新。

Answer 2

从我所知的pyodbc文档来看，似乎并不是对Hive的具体支持。如果您愿意接受不同的库，pyhs2专门支持与HiveServer2的连接（Hive 0.11或更新，我认为）。它可以使用pip（pip install pyhs2）安装，但至少在我的Mint Linux 17上，我还必须先安装libpython-dev和libsasl2-dev。

我在Hive中模拟了一个简单的近似场景（table_1在database_1内但不是default）：

hive> use default;
OK
Time taken: 0.324 seconds
hive> select * from table_1;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'table_1'
hive> use database_1;
OK
Time taken: 0.333 seconds
hive> describe table_1;
OK
content                     string                              
Time taken: 0.777 seconds, Fetched: 1 row(s)
hive> select * from table_1;
OK
this
is
some
sample
data
Time taken: 0.23 seconds, Fetched: 5 row(s)

那么这是一个利用pyhs2连接到Hive的基本脚本：

# Python 2.7
import pyhs2
from pyhs2.error import Pyhs2Exception

hql = "SELECT * FROM table_1"
with pyhs2.connect(
  host='localhost', port=10000, authMechanism="PLAIN", user="root",
  database="default"  # Of course it's possible just to specify database_1 here
) as db:
  with db.cursor() as cursor:

    try:
      print "Trying default database"
      cursor.execute(hql)
      for row in cursor.fetch(): print row
    except Pyhs2Exception as error:
      print(str(error))

    print "Switching databases to database_1"
    cursor.execute("use database_1")
    cursor.execute(hql)
    for row in cursor.fetch(): print row

这是结果输出：

Trying default database
"Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'table_1'"
Switching databases to database_1
['this']
['is']
['some']
['sample']
['data']

正如我在代码的注释行中所指出的那样，完全可以直接使用database_1而不是default启动连接，但我想尝试模仿您正在做的事情使用您在问题中发布的代码（并演示在启动连接后切换数据库的功能）。

无论如何，如果你愿意接受非pyodbc解决方案，希望能有所思考。

Answer 3

我了解到你可以设置类似ODBC连接字符串中的reducer数量，例如：

string = 'dsn=hive/driver/path;mapred.reduce.tasks=100;....'
connection = pyodbc.connect(string, autocommit=True)

这使您可以在连接中使用所需的特定设置;这并没有解决切换数据库的问题，但它解决了将设置引入Hive的其他情况，这是我的大部分问题。

Python odbc cursor：执行查询后保持持久状态

3 个答案: