我运行了查询
df.loc[df.F.apply(lambda t : 3 in t)]
I C F
0 1 80 (1, 2, 3, 4)
3 4 80 (4, 0, 8, 3, 2)
使用CREATE TABLE IF NOT EXISTS OMIDimensionshql.DimPrimaryProduct (PrimaryProductKey int, abc STRING, SCDStartDate timestamp NOT NULL, SCDEndDate timestamp, OMIDQFailedFlag boolean, OMIComputeDeletedFlag boolean NOT NULL, OMIComputeCreatedDate timestamp NOT NULL, OMIComputeModifiedDate timestamp NOT NULL ) Using delta LOCATION 'adl://psinsightsadlsdev01.azuredatalakestore.net//PPE/Compute/OMIDimensions/DimPrimaryProductGrouping/Full/'
,但出现以下错误-
spark.sql()
当我在Databricks集群上的Spark SQL笔记本中执行相同的查询时,它可以工作,但是当我使用Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CREATE TABLE IF NOT EXISTS OMIDimensionshql.DimPrimaryProduct (PrimaryProductKey int, abc STRING, SCDStartDate timestamp NOT'(line 1, pos 121)
== SQL ==
CREATE TABLE IF NOT EXISTS OMIDimensionshql.DimPrimaryProduct (PrimaryProductKey int, abc STRING, SCDStartDate timestamp NOT NULL, SCDEndDate timestamp, OMIDQFailedFlag boolean, OMIComputeDeletedFlag boolean NOT NULL, OMIComputeCreatedDate timestamp NOT NULL, OMIComputeModifiedDate timestamp NOT NULL ) Using delta LOCATION 'adl://psinsightsadlsdev01.azuredatalakestore.net//PPE/Compute/OMIDimensions/DimPrimaryProductGrouping/Full/'
-------------------------------------------------------------------------------------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at com.ms.omi.meta.execute.Execute$$anonfun$createSubjectAreaTables$1.apply(Execute.scala:55)
at com.ms.omi.meta.execute.Execute$$anonfun$createSubjectAreaTables$1.apply(Execute.scala:46)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ms.omi.meta.execute.Execute$.createSubjectAreaTables(Execute.scala:46)
at com.ms.omi.meta.entry.EntOmiMetaStore$.main(EntOmiMetaStore.scala:21)
at com.ms.omi.meta.entry.EntOmiMetaStore.main(EntOmiMetaStore.scala)
Process finished with exit code 1
在Scala中本地执行该查询时,它却无法工作。
答案 0 :(得分:1)
NOT NULL
约束。
Databricks使用它自己的运行时,带有大量专有扩展,因此开放源代码Spark发行版中不一定存在这些功能。
实际上,您尝试使用的另一个功能-Databricks Delta-也是专有扩展。
答案 1 :(得分:0)
仅有两个原因:
达美在同一件事上也有缺陷。如果要共享现有的专用群集(Databricks 4.0或小于5.0 Beta的群集),则不能对DDL中的列执行NOT NULL的操作。如果您可以使用5.0 Beta或5.0官方版本,则此功能现在支持。 Databricks团队在5.0 Beta及更高版本中修复了此问题,并在MERGE INTO中限制了10K。
您可能想要执行以下操作
sql("SET spark.databricks.delta.preview.enabled=true")
sql("SET spark.databricks.delta.merge.joinBasedMerge.enabled = true")