正确的SQL查询在Spark2-Scala上引发ParseException

时间:2019-07-15 08:41:25

标签: sql scala apache-spark parseexception

我编写了一个在Hive中运行良好的HQL查询,将其转换为动态sparkSql Scala查询会引发ParseException:

    val l="1900-01-01 00:00:00.000001"
    val use_database="dev_lkr_send"

    val dfMet = spark.sql(s"""select 
    maxxx.cd_anomalie,
    maxxx.cd_famille,
    maxxx.libelle AS LIB_ANOMALIE,
    maxxx.MAJ_DATE AS DT_MAJ,
    maxxx.classification,
    maxxx.nb_rejeux AS NB_REJEUX,
    case when maxxx.indic_cd_erreur = 'O' then 1 else 0 end AS TOP_INDIC_CD_ERREUR,
    case when maxxx.invalidation_coordonnee = 'O' then 1 else 0 end AS TOP_COORDONNEE_INVALIDE,
    case when maxxx.typ_mvt = 'S' then 1 else 0 end AS TOP_SUPP,
    case when maxxx.typ_mvt = 'S' then to_date(substr(maxxx.dt_capt, 1, 19)) else null end AS DT_SUPP,
    minnn.typ_mvt,
    maxxx.typ_mvt,
    case when minnn.typ_mvt = 'C' then 'C' else 'M' end as TYP_MVT
from 
  (select s.cd_anomalie, s.cd_famille, s.libelle, s.maj_date, s.classification, s.nb_rejeux, s.dt_capt, s.typ_mvt from ${use_database}.pz_send_param_ano as s
    join
    (select cd_anomalie, min(dt_capt) as dtmin from ${use_database}.pz_send_param_ano where '"""+l+"""' <dtcapt group by cd_anomalie) as minn
    on s.cd_anomalie=minn.cd_anomalie and s.dt_capt=minn.dtmin) as minnn
join
    (select s.cd_anomalie, s.cd_famille, s.libelle, s.maj_date, s.classification, s.nb_rejeux, s.dt_capt, s.typ_mvt, s.indic_cd_erreur, s.invalidation_coordonnee from ${use_database}.pz_send_param_ano as s
    join
    (select cd_anomalie, max(dt_capt) as dtmax from ${use_database}.pz_send_param_ano group by cd_anomalie) as maxx
    on s.cd_anomalie=maxx.cd_anomalie and s.dt_capt=maxx.dtmax) as maxxx
on minnn.cd_anomalie=maxxx.cd_anomalie""")

这是完整的异常日志:

  

org.apache.spark.sql.catalyst.parser.ParseException:输入不匹配   'from'期望{,'WHERE','GROUP','ORDER','HAVING','LIMIT',   “横向”,“窗口”,“联盟”,“例外”,“减号”,“相交”,“排序”,   'CLUSTER','DISTRIBUTE'}(第15行,位置0)

1 个答案:

答案 0 :(得分:1)

尝试用 select 行中的别名包装 16th,21st 查询。

示例:

val dfMet = spark.sql(s"""select 
    maxxx.cd_anomalie,
    maxxx.cd_famille,
    maxxx.libelle AS LIB_ANOMALIE,
    maxxx.MAJ_DATE AS DT_MAJ,
    maxxx.classification,
    maxxx.nb_rejeux AS NB_REJEUX,
    case when maxxx.indic_cd_erreur = 'O' then 1 else 0 end AS TOP_INDIC_CD_ERREUR,
    case when maxxx.invalidation_coordonnee = 'O' then 1 else 0 end AS TOP_COORDONNEE_INVALIDE,
    case when maxxx.typ_mvt = 'S' then 1 else 0 end AS TOP_SUPP,
    case when maxxx.typ_mvt = 'S' then to_date(substr(maxxx.dt_capt, 1, 19)) else null end AS DT_SUPP,
    minnn.typ_mvt,
    maxxx.typ_mvt,
    case when minnn.typ_mvt = 'C' then 'C' else 'M' end as TYP_MVT
from 
  (select s.cd_anomalie, s.cd_famille, s.libelle, s.maj_date, s.classification, s.nb_rejeux, s.dt_capt, s.typ_mvt from ${use_database}.pz_send_param_ano as s)s
    join
    (select cd_anomalie, min(dt_capt) as dtmin from ${use_database}.pz_send_param_ano where '"""+l+"""' <dtcapt group by cd_anomalie) as minn
    on s.cd_anomalie=minn.cd_anomalie and s.dt_capt=minn.dtmin) as minnn
join
    (select s.cd_anomalie, s.cd_famille, s.libelle, s.maj_date, s.classification, s.nb_rejeux, s.dt_capt, s.typ_mvt, s.indic_cd_erreur, s.invalidation_coordonnee from ${use_database}.pz_send_param_ano as s)s
    join
    (select cd_anomalie, max(dt_capt) as dtmax from ${use_database}.pz_send_param_ano group by cd_anomalie) as maxx
    on s.cd_anomalie=maxx.cd_anomalie and s.dt_capt=maxx.dtmax) as maxxx
on minnn.cd_anomalie=maxxx.cd_anomalie""")