从spark调用.hql文件direclty

时间:2018-05-22 16:14:35

标签: scala apache-spark hive pyspark hiveql

我试图像下面那样运行hql文件,但是收到错误noviablealtexception

val QUERY = fromFile(s"$SQLDIR/select_cust_info.hql").getLines.mkString
sqlContext.sql(s"$QUERY").show()

你能帮忙,怎么运行它?

按照要求,select_cust_info.hql就像这样

set hive.execution.engine=mr;
    --new records
    insert into cust_info_stage 
    select row_number () over () + ${hiveconf:maxid} as row_id , name, age, sex, country , upd_date, create_date
    from ${hiveconf:table} r
    left join  cust_dim d on id=uid
    where not  exists ( select 1 from cust_info c where c.id=r.id);

    --upd record 
    insert into cust_info_stage 
    select row_id , name, age, sex, country , upd_date, create_date
    from ${hiveconf:table} r
    inner join cust_info_stage on 
    left join  cust_dim d on id=uid
    where not  exists ( select 1 from cust_info c where c.id=r.id);
    !quit

上面的hql只是一个示例,我想从sqlContext调用这样的hqls。

现在我要检查的下一个级别是,如果.hqls中定义了hiveconf,如何在sqlContext中传递这些变量。

1 个答案:

答案 0 :(得分:0)

您可以尝试下面的代码在pyspark v2 +中运行hql文件

from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
sc =SparkContext.getOrCreate()
sqlCtx = SQLContext(sc)
with open("/home/hadoop/test/abc.hql") as fr:
    query = fr.read()
    print(query)
    results = sqlCtx.sql(query)