我必须从Python执行以下针对Hive的查询:
SELECT * FROM user WHERE age > ${hiveconf:AGE}
至于现在,我有以下工作代码片段:
import pyhs2
with pyhs2.connect(host='localhost',
port=60850,
authMechanism="PLAIN",
user='hduser',
database='default') as conn:
with conn.cursor() as cur:
cur.execute("SELECT * FRPM user WHERE age > ?", 10)
所以我可以使用PyHs2将参数传递给查询。但是如何从Python代码执行变量替换以便不更改原始查询(即以干净的方式将${hiveconf:AGE}
替换为某个值)?
答案 0 :(得分:1)
这样的事情?:
def get_sql(substitution="${hiveconf:AGE}"):
sql = "select * from bla where blub > {variable}"
sql = sql.format(variable=substitution)
return sql
结果:
get_sql()
"select * from bla where blub > ${hiveconf:AGE}"
get_sql("test")
"select * from bla where blub > test"
有关格式语法的更多详细信息,请参阅此处:https://docs.python.org/2/library/string.html#format-string-syntax
答案 1 :(得分:1)
你可以在python中使用subprocess。您可以将sql存储在seprate文件中,并使用以下格式执行它。您还可以添加更多变量
import subprocess
value1=your_value
p=subprocess.Popen("hive -f /sql/file/location/script.hql"+" --hiveconf variable1="+value1,shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
if err==None:
print "successfull"
else:
print "not successfull"
或者如果你想执行它,下面的pyhs2方式就是执行语句的格式。
cur.execute("SELECT * FROM user WHERE age > %d"% 10)