我在群集模式下的纱线群集上有一个spark sql 2.1.1作业,我想创建一个空的外部蜂巢表(带有位置的分区将在后面的步骤中添加)。
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
当我运行这份工作时,我收到错误:
CREATE EXTERNAL TABLE必须附有LOCATION
但是当我在Hue上的Hive Editor上运行相同的查询时,它运行得很好。我试图在SparkSQL 2.1.1文档中找到答案但是空洞了。
有谁知道为什么Spark SQL对查询更严格?
答案 0 :(得分:2)
TL; DR EXTERNAL
,但没有LOCATION
is not allowed。
明确的答案在Spark SQL的语法定义文件SqlBase.g4中。
您可以找到CREATE EXTERNAL TABLE
的定义为createTableHeader:
CREATE TEMPORARY? EXTERNAL? TABLE (IF NOT EXISTS)? tableIdentifier
此定义用于受支持的SQL statements。
除非我误认为locationSpec
是可选的。那是根据ANTLR语法。代码可能会另有决定,但似乎确实如此。
scala> spark.version
res4: String = 2.3.0-SNAPSHOT
val q = "CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'"
scala> sql(q)
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0)
== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
^^^
at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1096)
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1064)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:1064)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:55)
at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateHiveTableContext.accept(SqlBaseParser.java:1124)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
... 48 elided
默认SparkSqlParser
(astBuilder
为SparkSqlAstBuilder
)的following assertion导致异常:
if (external && location.isEmpty) {
operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)
如果您认为应该允许这种情况,我会考虑在Spark's JIRA中报告问题。请参阅SPARK-2825以获得强有力的论据以获得支持:
CREATE EXTERNAL TABLE已经根据我的知识运行,并且应该与Hive具有相同的语义。