我使用spark-shell运行一个简单的代码:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val test_data = sqlContext.read.json("music.json")
test_data.registerTempTable("test_data")
val temp1 = sqlContext.sql("select user.id_str as userid, text from test_data")
val temp2 = temp1.map(t => (t.getAs[String]("userid"),t.getAs[String]("text").split('@').length-1))
直到这里,没关系。 然后我想保存结果:
temp2.saveAsTextFile("test")
然后它出来了:
16/05/18 20:05:14 ERROR Executor: Exception in task 1.0 in stage 15.0 (TID 23)
java.lang.NullPointerException
at scala.collection.immutable.StringLike$class.split(StringLike.scala:201)
at scala.collection.immutable.StringOps.split(StringOps.scala:31)
at $line62.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:31)
at $line62.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:31)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1198)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
我没有使用java的经验。我想知道问题是来自我的scala代码还是配置有问题?