我试图像下面那样运行hql文件,但是收到错误noviablealtexception
val QUERY = fromFile(s"$SQLDIR/select_cust_info.hql").getLines.mkString
sqlContext.sql(s"$QUERY").show()
你能帮忙,怎么运行它?
按照要求,select_cust_info.hql就像这样
set hive.execution.engine=mr;
--new records
insert into cust_info_stage
select row_number () over () + ${hiveconf:maxid} as row_id , name, age, sex, country , upd_date, create_date
from ${hiveconf:table} r
left join cust_dim d on id=uid
where not exists ( select 1 from cust_info c where c.id=r.id);
--upd record
insert into cust_info_stage
select row_id , name, age, sex, country , upd_date, create_date
from ${hiveconf:table} r
inner join cust_info_stage on
left join cust_dim d on id=uid
where not exists ( select 1 from cust_info c where c.id=r.id);
!quit
上面的hql只是一个示例,我想从sqlContext调用这样的hqls。
现在我要检查的下一个级别是,如果.hqls中定义了hiveconf,如何在sqlContext中传递这些变量。
答案 0 :(得分:0)
您可以尝试下面的代码在pyspark v2 +中运行hql文件
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
sc =SparkContext.getOrCreate()
sqlCtx = SQLContext(sc)
with open("/home/hadoop/test/abc.hql") as fr:
query = fr.read()
print(query)
results = sqlCtx.sql(query)