如何修复'org.apache.spark.sql.catalyst.analysis.package $ AnalysisErrorAt.failAnalysis(package.scala:42)'

时间:2019-06-11 23:20:49

标签: apache-spark spark-streaming

我做了什么(结构化流式传输)

> 1.) ./bin/pyspark
> 2.) spark
> 3.) static = spark.read.json("/data/activity-data/")
> 4.) dataSchema = static.schema
> 5.) streaming = spark.readStream.schema(dataSchema).option("maxFilesPerTrigger", 1)\
> 6.) .json("/data/activity-data")
> 7.) activityCounts = streaming.groupBy("gt").count()

然后我得到了这个巨大的错误。 您能帮我解决这个问题吗?

错误:

  

org.apache.spark.sql.catalyst.analysis.package $ AnalysisErrorAt.failAnalysis(package.scala:42)     在   org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1 $$ anonfun $ apply $ 3.applyOrElse(CheckAnalysis.scala:110)     在   org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1 $$ anonfun $ apply $ 3.applyOrElse(CheckAnalysis.scala:107)     在   org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:278)     在   org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:278)     在   org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:70)     在   org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277)     在   org.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ transformExpressionsUp $ 1.apply(QueryPlan.scala:93)     在   org.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ transformExpressionsUp $ 1.apply(QueryPlan.scala:93)     在   org.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ 1.apply(QueryPlan.scala:105)     在   org.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ 1.apply(QueryPlan.scala:105)     在   org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:70)     在   org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression $ 1(QueryPlan.scala:104)     在   org.apache.spark.sql.catalyst.plans.QueryPlan.org $ apache $ spark $ sql $ catalyst $ plans $ QueryPlan $$ recursiveTransform $ 1(QueryPlan.scala:116)     在   org.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ org $ apache $ spark $ sql $ catalyst $ plans $ QueryPlan $$ recursiveTransform $ 1 $ 2.apply(QueryPlan.scala:121)     在   scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)     在   scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)     在   scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)     在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)     在   scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)     在scala.collection.AbstractTraversable.map(Traversable.scala:104)     在   org.apache.spark.sql.catalyst.plans.QueryPlan.org $ apache $ spark $ sql $ catalyst $ plans $ QueryPlan $$ recursiveTransform $ 1(QueryPlan.scala:121)     在   org.apache.spark.sql.catalyst.plans.QueryPlan $$ anonfun $ 2.apply(QueryPlan.scala:126)     在   org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)     在   org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:126)     在   org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:93)     在   org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1.apply(CheckAnalysis.scala:107)     在   org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1.apply(CheckAnalysis.scala:85)     在   org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)     在   org.apache.spark.sql.catalyst.analysis.CheckAnalysis $ class.checkAnalysis(CheckAnalysis.scala:85)     在   org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)     在   org.apache.spark.sql.catalyst.analysis.Analyzer $$ anonfun $ executeAndCheck $ 1.apply(Analyzer.scala:108)     在   org.apache.spark.sql.catalyst.analysis.Analyzer $$ anonfun $ executeAndCheck $ 1.apply(Analyzer.scala:105)     在   org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper $ .markInAnalyzer(AnalysisHelper.scala:201)     在   org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)     在   org.apache.spark.sql.execution.QueryExecution.analyzed $ lzycompute(QueryExecution.scala:57)     在   org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)     在   org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)     在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:78)在   org.apache.spark.sql.RelationalGroupedDataset.toDF(RelationalGroupedDataset.scala:65)     在   org.apache.spark.sql.RelationalGroupedDataset.count(RelationalGroupedDataset.scala:237)     在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)在   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)在   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)在   py4j.Gateway.invoke(Gateway.java:282)在   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)     在py4j.commands.CallCommand.execute(CallCommand.java:79)处   py4j.GatewayConnection.run(GatewayConnection.java:238)在   java.lang.Thread.run(Thread.java:748)

1 个答案:

答案 0 :(得分:0)

问候!

如果您有重复的密钥,则会出现此错误。

您还可以参考“ https://issues.apache.org/jira/browse/SPARK-10925”

谢谢! Kamleshkumar Gujarahti