这里我试图动态地将数据帧的时间戳添加到此传入数据
{"动作":"事件"" ID":1173," LAT":0.0" LON&# 34;:0.0" rollid":55,"事件":"类型"" CCD":0," FONE& #34;:"伊俄涅""版本":" 10.1""项目":"棚屋"}
到上面的传入数据,我试图用下面的代码附加时间戳
foreachRDD(rdd=>
74 {
75 val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
76 import sqlContext.implicits._
77 val dataframe =sqlContext.read.json(rdd.map(_._2)).toDF()
78 import org.apache.spark.sql.functions._
79 val newDF=dataframe.withColumn("Timestamp_val",current_timestamp())
80 newDF.show()
81 newDF.printSchema()
这应该让我得出如下所示
但是这段代码给我带来了麻烦,有时它会打印架构,有时它会将此异常抛给" 第79行"
java.lang.IllegalArgumentException:要求失败 在scala.Predef $ .require(Predef.scala:221) 在org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199) 在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 14.apply(Analyzer.scala:354) 在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 14.apply(Analyzer.scala:353) 在scala.collection.TraversableLike $$ anonfun $ flatMap $ 1.apply(TraversableLike.scala:251) 在scala.collection.TraversableLike $$ anonfun $ flatMap $ 1.apply(TraversableLike.scala:251) 在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59) 在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 在scala.collection.TraversableLike $ class.flatMap(TraversableLike.scala:251) 在scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) 在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10.applyOrElse(Analyzer.scala:353) 在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10.applyOrElse(Analyzer.scala:347) 在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan $$ anonfun $ resolveOperators $ 1.apply(LogicalPlan.scala:57) 在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan $$ anonfun $ resolveOperators $ 1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:69) 在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56) 在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $ .apply(Analyzer.scala:347) 在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $ .apply(Analyzer.scala:328) 在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:83) 在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:80) 在scala.collection.LinearSeqOptimized $ class.foldLeft(LinearSeqOptimized.scala:111) 在scala.collection.immutable.List.foldLeft(List.scala:84) 在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply(RuleExecutor.scala:80) 在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply(RuleExecutor.scala:72) 在scala.collection.immutable.List.foreach(List.scala:318) 在org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) 在org.apache.spark.sql.execution.QueryExecution.analyzed $ lzycompute(QueryExecution.scala:36) 在org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) 在org.apache.spark.sql.DataFrame。(DataFrame.scala:133) 在org.apache.spark.sql.DataFrame.org $ apache $ spark $ sql $ DataFrame $$ withPlan(DataFrame.scala:2126) 在org.apache.spark.sql.DataFrame.select(DataFrame.scala:707) 在org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1188) 在HiveGenerator $$ anonfun $ main $ 1.apply(HiveGenerator.scala:79) 在HiveGenerator $$ anonfun $ main $ 1.apply(HiveGenerator.scala:73)
我哪里出错了,请帮忙。
答案 0 :(得分:2)
从stackoverflow聊天中了解到,
修复它,就像这样。
df.withColumn("current_time",lit(CurrentDate))
因为.withColumn()中的第二个参数将指向一个命名列和
val newDF=dataframe.withColumn("Timestamp_val",current_timestamp())
不会生成命名列,因此会出现异常。