加载JSON文件时的org.apache.spark.sql.AnalysisException

时间:2018-03-09 00:07:27

标签: json apache-spark

我有一个JSON对象的大型数据存储区,我试图在数据帧中加载到Spark中,但是在尝试对数据进行任何处理时,我收到了一个AnalysisException。输入数据中有几种不同格式的JSON对象。其中一些在JSON的不同顺序和/或级别中具有相同的字段。

我已使用以下代码加载JSON。

    val messagesDF = spark.read.json("data/test")

但是一旦我尝试对数据执行任何操作,我就会收到以下堆栈跟踪:

Caused by: java.lang.reflect.InvocationTargetException: org.apache.spark.sql.AnalysisException: Reference 'SGLN' is ambiguous, could be: SGLN#1099, SGLN#1204.;
  at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:235)
  ... 48 more
Caused by: org.apache.spark.sql.AnalysisException: Reference 'SGLN' is ambiguous, could be: SGLN#1099, SGLN#1204.;
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:264)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:158)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:130)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at org.apache.spark.sql.types.StructType.foreach(StructType.scala:96)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at org.apache.spark.sql.types.StructType.map(StructType.scala:96)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:129)
  at org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:83)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
  at 

org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
      at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2814)
      at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
      at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
      ... 52 more

我在这个错误上找到的所有东西都来自于加入不同的数据帧,这肯定与我正在尝试做的类似,但是我无法进入并重构数据帧以提供唯一的ID。

0 个答案:

没有答案