线程" main"中的例外情况java.lang.IllegalArgumentException:要求失败

时间:2017-01-17 11:53:30

标签: scala spark-streaming spark-dataframe

这里我试图动态地将数据帧的时间戳添加到此传入数据

  

{"动作":"事件"" ID":1173," LAT":0.0" LON&# 34;:0.0" rollid":55,"事件":"类型"" CCD":0," FONE& #34;:"伊俄涅""版本":" 10.1""项目":"棚屋"}

到上面的传入数据,我试图用下面的代码附加时间戳

foreachRDD(rdd=>
         74 {
         75 val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
         76 import sqlContext.implicits._
         77 val dataframe =sqlContext.read.json(rdd.map(_._2)).toDF()
         78 import org.apache.spark.sql.functions._
         79  val newDF=dataframe.withColumn("Timestamp_val",current_timestamp())
         80 newDF.show()
         81 newDF.printSchema()

这应该让我得出如下所示

enter image description here

但是这段代码给我带来了麻烦,有时它会打印架构,有时它会将此异常抛给" 第79行"

  

java.lang.IllegalArgumentException:要求失败           在scala.Predef $ .require(Predef.scala:221)           在org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 14.apply(Analyzer.scala:354)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 14.apply(Analyzer.scala:353)           在scala.collection.TraversableLike $$ anonfun $ flatMap $ 1.apply(TraversableLike.scala:251)           在scala.collection.TraversableLike $$ anonfun $ flatMap $ 1.apply(TraversableLike.scala:251)           在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)           在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)           在scala.collection.TraversableLike $ class.flatMap(TraversableLike.scala:251)           在scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10.applyOrElse(Analyzer.scala:353)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10.applyOrElse(Analyzer.scala:347)           在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan $$ anonfun $ resolveOperators $ 1.apply(LogicalPlan.scala:57)           在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan $$ anonfun $ resolveOperators $ 1.apply(LogicalPlan.scala:57)           at org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:69)           在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $ .apply(Analyzer.scala:347)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $ .apply(Analyzer.scala:328)           在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:83)           在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:80)           在scala.collection.LinearSeqOptimized $ class.foldLeft(LinearSeqOptimized.scala:111)           在scala.collection.immutable.List.foldLeft(List.scala:84)           在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply(RuleExecutor.scala:80)           在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply(RuleExecutor.scala:72)           在scala.collection.immutable.List.foreach(List.scala:318)           在org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)           在org.apache.spark.sql.execution.QueryExecution.analyzed $ lzycompute(QueryExecution.scala:36)           在org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36)           at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)           在org.apache.spark.sql.DataFrame。(DataFrame.scala:133)           在org.apache.spark.sql.DataFrame.org $ apache $ spark $ sql $ DataFrame $$ withPlan(DataFrame.scala:2126)           在org.apache.spark.sql.DataFrame.select(DataFrame.scala:707)           在org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1188)           在HiveGenerator $$ anonfun $ main $ 1.apply(HiveGenerator.scala:79)           在HiveGenerator $$ anonfun $ main $ 1.apply(HiveGenerator.scala:73)

我哪里出错了,请帮忙。

1 个答案:

答案 0 :(得分:2)

从stackoverflow聊天中了解到,

修复它,就像这样。

df.withColumn("current_time",lit(CurrentDate))

因为.withColumn()中的第二个参数将指向一个命名列和

val newDF=dataframe.withColumn("Timestamp_val",current_timestamp())

不会生成命名列,因此会出现异常。