inSspark SQL中创建表不支持不为null

时间:2018-10-22 12:04:15

标签: sql scala apache-spark databricks

我运行了查询

df.loc[df.F.apply(lambda t : 3 in t)]


    I   C   F
0   1   80  (1, 2, 3, 4)
3   4   80  (4, 0, 8, 3, 2)

使用CREATE TABLE IF NOT EXISTS OMIDimensionshql.DimPrimaryProduct (PrimaryProductKey int, abc STRING, SCDStartDate timestamp NOT NULL, SCDEndDate timestamp, OMIDQFailedFlag boolean, OMIComputeDeletedFlag boolean NOT NULL, OMIComputeCreatedDate timestamp NOT NULL, OMIComputeModifiedDate timestamp NOT NULL ) Using delta LOCATION 'adl://psinsightsadlsdev01.azuredatalakestore.net//PPE/Compute/OMIDimensions/DimPrimaryProductGrouping/Full/' ,但出现以下错误-

spark.sql()

当我在Databricks集群上的Spark SQL笔记本中执行相同的查询时,它可以工作,但是当我使用Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'CREATE TABLE IF NOT EXISTS OMIDimensionshql.DimPrimaryProduct (PrimaryProductKey int, abc STRING, SCDStartDate timestamp NOT'(line 1, pos 121) == SQL == CREATE TABLE IF NOT EXISTS OMIDimensionshql.DimPrimaryProduct (PrimaryProductKey int, abc STRING, SCDStartDate timestamp NOT NULL, SCDEndDate timestamp, OMIDQFailedFlag boolean, OMIComputeDeletedFlag boolean NOT NULL, OMIComputeCreatedDate timestamp NOT NULL, OMIComputeModifiedDate timestamp NOT NULL ) Using delta LOCATION 'adl://psinsightsadlsdev01.azuredatalakestore.net//PPE/Compute/OMIDimensions/DimPrimaryProductGrouping/Full/' -------------------------------------------------------------------------------------------------------------------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) at com.ms.omi.meta.execute.Execute$$anonfun$createSubjectAreaTables$1.apply(Execute.scala:55) at com.ms.omi.meta.execute.Execute$$anonfun$createSubjectAreaTables$1.apply(Execute.scala:46) at scala.collection.immutable.List.foreach(List.scala:381) at com.ms.omi.meta.execute.Execute$.createSubjectAreaTables(Execute.scala:46) at com.ms.omi.meta.entry.EntOmiMetaStore$.main(EntOmiMetaStore.scala:21) at com.ms.omi.meta.entry.EntOmiMetaStore.main(EntOmiMetaStore.scala) Process finished with exit code 1 在Scala中本地执行该查询时,它却无法工作。

2 个答案:

答案 0 :(得分:1)

在标准Spark运行时中不支持

NOT NULL约束。

Databricks使用它自己的运行时,带有大量专有扩展,因此开放源代码Spark发行版中不一定存在这些功能。

实际上,您尝试使用的另一个功能-Databricks Delta-也是专有扩展。

答案 1 :(得分:0)

仅有两个原因:

  1. 达美在同一件事上也有缺陷。如果要共享现有的专用群集(Databricks 4.0或小于5.0 Beta的群集),则不能对DDL中的列执行NOT NULL的操作。如果您可以使用5.0 Beta或5.0官方版本,则此功能现在支持。 Databricks团队在5.0 Beta及更高版本中修复了此问题,并在MERGE INTO中限制了10K。

  2. 您可能想要执行以下操作

sql("SET spark.databricks.delta.preview.enabled=true")

sql("SET spark.databricks.delta.merge.joinBasedMerge.enabled = true")