在数据帧中使用数据透视时解决“ java.lang.IllegalArgumentException:字段“ null”不存在”的解决方案

时间:2018-09-12 16:06:20

标签: apache-spark apache-spark-sql

我使用以下表达式使用Scala将行转换为数据帧中的列:

val df = Seq(  
  ("ID-1", "First Name", "Jolly"),  
  ("ID-1", "Middle Name", "Jr"),  
  ("ID-1", "Last Name", "Hudson"),  
  ("ID-2", "First Name", "Kathy"),  
  ("ID-2", "Last Name", "Oliver"),  
  ("ID-3", "Last Name", "Short"),  
  ("ID-3", "Middle Name", "M"),  
  ("ID-4", "First Name", "Denver")  
).toDF("ID", "Title", "Values")  

df.filter($"Title" isin ("First Name", "Last Name", "Middle Name")).
  groupBy("ID").pivot("Title").agg(first($"Values")).   
  select( $"ID", $"First Name", $"Last Name", $"Middle Name").  
  show(false) 


 // +----+----------+---------+-----------+  
 // |ID  |First Name|Last Name|Middle Name|  
 // +----+----------+---------+-----------+  
 // |ID-1|Jolly     |Hudson   |Jr         |
 // |ID-3|null      |Short    |M          |
 // |ID-4|Denver    |null     |null       |
 // |ID-2|Kathy     |Oliver   |null       |
 // +----+----------+---------+-----------+

输出符合预期,但最终出现如下异常:

  

java.lang.IllegalArgumentException:字段“ null”不存在

在获得预期的输出和解决方案后,请帮助理解导致此异常的原因。

以下是错误日志:

2018-09-12 12:09:54 [Executor task launch worker-1] ERROR o.a.s.e.Executor - Exception in task 15.0 in stage 69.0 (TID 4453)
java.lang.IllegalArgumentException: Field "null" does not exist.
    at org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:233)
    at org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:233)
    at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
    at scala.collection.AbstractMap.getOrElse(Map.scala:58)
    at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:232)
    at org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.fieldIndex(rows.scala:213)
    at gbam.refdata.dataquality.utils.DataQualityRule$class.getColumn(DataQualityRule.scala:147)
    at gbam.refdata.dataquality_rules2.VendorpartyAddress.getColumn(VendorpartyAddress.scala:27)
    at gbam.refdata.dataquality.utils.DataQualityRule$$anonfun$getMissing$1$1.apply(DataQualityRule.scala:153)
    at gbam.refdata.dataquality.utils.DataQualityRule$$anonfun$getMissing$1$1.apply(DataQualityRule.scala:153)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at gbam.refdata.dataquality.utils.DataQualityRule$class.getMissing$1(DataQualityRule.scala:152)
    at gbam.refdata.dataquality.utils.DataQualityRule$class.getBreaks(DataQualityRule.scala:156)
    at gbam.refdata.dataquality_rules2.VendorpartyAddress.getBreaks(VendorpartyAddress.scala:27)
    at gbam.refdata.dataquality_rules2.VendorpartyAddress$$anonfun$4.apply(VendorpartyAddress.scala:103)
    at gbam.refdata.dataquality_rules2.VendorpartyAddress$$anonfun$4.apply(VendorpartyAddress.scala:103)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案