Question

我正在以HDP工作。

export SPARK-MAJOR-VERSION=2 spark-submit --class com.spark.sparkexamples.Audit --master yarn --deploy-mode cluster \ --files /bigdata/datalake/app/config/metadata.csv BRNSAUDIT_v4.jar dl_raw.ACC /bigdatahdfs/landing/AUDIT/BW/2017/02/27/ACC_hash_total_and_count_20170227.dat TH 20170227

失败的错误是：

找不到表格或视图：dl_raw。ACC;第1行pos 94; ＆＃39;聚合[count（1）AS rec_cnt＃58L，＆＃39; count（＆＃39; BRCH_NUM）AS hashcount＃59，＆＃39; sum（＆＃39; ACC_NUM）AS hashsum＃60] + - ＆＃39;过滤（＆＃39;修剪（＆＃39; country_code）= trim（TH））＆amp;＆amp;（＆＃39; from_unixtime（＆＃39; unix_timestamp（＆＃39; substr）＆＃39; bus_date，0,11），MM / dd / yyyy），yyyyMMdd）= 20170227））+ - ＆＃39; UnresolvedRelation dl_raw。`ACC＆＃39; *

而Hive中存在表格，可以从spark-shell访问。

这是火花会话的代码。

val sparkSession = SparkSession.builder .appName("spark session example") .enableHiveSupport() .getOrCreate() 
sparkSession.conf.set("spark.sql.crossJoin.enabled", "true") 
val df_table_stats = sparkSession.sql("""select count(*) as rec_cnt,count(distinct BRCH_NUM) as hashcount, sum(ACC_NUM) as hashsum 
                                         from dl_raw.ACC 
                                         where trim(country_code) = trim('BW') 
                                         and from_unixtime(unix_timestamp(substr(bus_date,0,11),'MM/dd/yy‌yy'),'yyyyMMdd')='20‌170227'
                                      """)

Answer 1

在提交spark作业时，在--files参数中包含hive-site.xml。

Answer 2

您还可以将hive-site.xml配置文件从hive-conf目录复制到spark-conf目录。这样可以解决您的问题。

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

HDP中的spark2 sql无法访问Hive表

2 个答案: