Question

我试图像下面那样运行hql文件，但是收到错误noviablealtexception

val QUERY = fromFile(s"$SQLDIR/select_cust_info.hql").getLines.mkString
sqlContext.sql(s"$QUERY").show()

你能帮忙，怎么运行它？

按照要求，select_cust_info.hql就像这样

set hive.execution.engine=mr;
    --new records
    insert into cust_info_stage 
    select row_number () over () + ${hiveconf:maxid} as row_id , name, age, sex, country , upd_date, create_date
    from ${hiveconf:table} r
    left join  cust_dim d on id=uid
    where not  exists ( select 1 from cust_info c where c.id=r.id);

    --upd record 
    insert into cust_info_stage 
    select row_id , name, age, sex, country , upd_date, create_date
    from ${hiveconf:table} r
    inner join cust_info_stage on 
    left join  cust_dim d on id=uid
    where not  exists ( select 1 from cust_info c where c.id=r.id);
    !quit

上面的hql只是一个示例，我想从sqlContext调用这样的hqls。

现在我要检查的下一个级别是，如果.hqls中定义了hiveconf，如何在sqlContext中传递这些变量。

Answer 1

您可以尝试下面的代码在pyspark v2 +中运行hql文件

from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
sc =SparkContext.getOrCreate()
sqlCtx = SQLContext(sc)
with open("/home/hadoop/test/abc.hql") as fr:
    query = fr.read()
    print(query)
    results = sqlCtx.sql(query)

从spark调用.hql文件direclty

1 个答案: