我在下面的查询中可以在 SQL DEVELOPER 上正常运行:
SELECT C.CIS_DIVISION, C.EFFDT AS START_DT, LEAD(EFFDT, 1) OVER(PARTITION BY CIS_DIVISION, CHAR_TYPE_CD ORDER BY CIS_DIVISION, CHAR_TYPE_CD, EFFDT) - 1 AS END_DT , C.CHAR_VAL, C.CHAR_TYPE_CD FROM CI_CIS_DIV_CHAR C WHERE C.CHAR_TYPE_CD in ('C1-TFMPD','C1-TFMCR') ORDER BY CIS_DIVISION,CHAR_TYPE_CD, EFFDT
SQL开发人员中的输出:
CIS_DIVISION START_DT END_DT CHAR_VAL CHAR_TYPE_CD
747 01-Jan-10 (null) BATCH_DT C1-TFMPD
CAL 01-Jan-16 (null) BATCH_DT C1-TFMPD
NYC 01-Jan-90 (null) BATCH_DT C1-TFMPD
PERF1 01-Jan-01 (null) BATCH_DT C1-TFMPD
PERF2 01-Jan-01 (null) BATCH_DT C1-TFMPD
PERF3 01-Jan-01 (null) BATCH_DT C1-TFMPD
但是当使用eclipse在spark中运行时,相同的查询会给出错误:
代码
private static final String DIVISIONCHARS_QUERY = "SELECT C.CIS_DIVISION, "
+ "C.EFFDT AS START_DT, "
+"LEAD(EFFDT, 1) OVER(PARTITION BY CIS_DIVISION, "
+"CHAR_TYPE_CD ORDER BY CIS_DIVISION, "
+"CHAR_TYPE_CD, EFFDT) - 1 AS END_DT , "
+"C.CHAR_VAL, "
+"C.CHAR_TYPE_CD "
+"FROM CI_CIS_DIV_CHAR C "
+"WHERE C.CHAR_TYPE_CD in ('C1-TFMPD','C1-TFMCR') ORDER BY CIS_DIVISION,CHAR_TYPE_CD, EFFDT";
Dataset<Row> divChartable = sparkSession.read().format("jdbc").option("url",connection ).option("dbtable", "CI_CIS_DIV_CHAR").load();
divChartable.registerTempTable("CI_CIS_DIV_CHAR");
Dataset<Row> divCharsDS = sparkSession.sql(DIVISIONCHARS_QUERY);
由于我在查询中使用 CI_CIS_DIV_CHAR 表,因此我首先为其创建了临时表,否则将导致找不到表错误。
运行上面的代码时,它给出以下错误:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '(lead(C.`EFFDT`, 1, NULL) OVER (PARTITION BY C.`CIS_DIVISION`, C.`CHAR_TYPE_CD` ORDER BY C.`CIS_DIVISION` ASC NULLS FIRST, C.`CHAR_TYPE_CD` ASC NULLS FIRST, C.`EFFDT` ASC NULLS
FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) - 1)' due to data type mismatch: differing types in '(lead(C.`EFFDT`, 1, NULL) OVER (PARTITION BY C.`CIS_DIVISION`, C.`CHAR_TYPE_CD` ORDER BY C.`CIS_DIVISION`
ASC NULLS FIRST, C.`CHAR_TYPE_CD` ASC NULLS FIRST, C.`EFFDT` ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) - 1)' (timestamp and int).; line 1 pos 44;
'Sort ['CIS_DIVISION ASC NULLS FIRST, 'CHAR_TYPE_CD ASC NULLS FIRST, 'EFFDT ASC NULLS FIRST], true
+- 'Project [CIS_DIVISION#1424, EFFDT#1426 AS START_DT#1448, (lead(EFFDT#1426, 1, null) windowspecdefinition(CIS_DIVISION#1424, CHAR_TYPE_CD#1425, CIS_DIVISION#1424 ASC NULLS FIRST, CHAR_TYPE_CD#1425 ASC NULLS FIRST,
EFFDT#1426 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) - 1) AS END_DT#1449, CHAR_VAL#1427, CHAR_TYPE_CD#1425]
+- Filter CHAR_TYPE_CD#1425 IN (C1-TFMPD,C1-TFMCR)
+- SubqueryAlias C
+- SubqueryAlias ci_cis_div_char
+- Relation[CIS_DIVISION#1424,CHAR_TYPE_CD#1425,EFFDT#1426,CHAR_VAL#1427,VERSION#1428,ADHOC_CHAR_VAL#1429,CHAR_VAL_FK1#1430,CHAR_VAL_FK2#1431,CHAR_VAL_FK3#1432,CHAR_VAL_FK4#1433,CHAR_VAL_FK5#1434,
ENABLED_FLG#1435] JDBCRelation(CI_CIS_DIV_CHAR) [numPartitions=1]
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:116)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:120)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:120)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:95)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at com.sample.Transformation.initializeProductDerivationCache(Transformation.java:189)
at com.sample.Transformation.main(Transformation.java:114)
oracle关键字不适用于spark sql吗?还是还有其他问题?
使用的火花版本:2.3.0