我有两段相似的代码,但其中一段有效,另一段却没有。
有效的那个:
private def loadValidStations() = {
dfReader.csv(filePath + "stations.csv")
.withColumn("lat", 'lat.cast(DoubleType))
.withColumn("lon", 'lon.cast(DoubleType))
.filter('lat.isNotNull && 'lon.isNotNull && !('stn.isNull && 'wban.isNull))
.map(row => {
val stn = row.getAs[String]("stn")
val modifiedStn = if (stn == null) "_" else stn
val wban = row.getAs[String]("wban")
val modifiedWban = if (wban == null) "_" else wban
(modifiedStn + "_" + modifiedWban, Location(row.getAs[Double]("lat"), row.getAs[Double]("lon")))
})
}
不起作用的那个:
private def getValidLocTemp(year: Int) = {
dfReader.csv(filePath + year + ".csv")
.withColumn("temp", 'temp.cast(DoubleType))
.filter(!('stn.isNull && 'wban.isNull))
.map(row => { // exception of this line
val stn = row.getAs[String]("stn")
val modifiedStn = if (stn == null) "_" else stn
val wban = row.getAs[String]("wban")
val modifiedWban = if (wban == null) "_" else wban
(modifiedStn + "_" + modifiedWban, row.getAs("month"), row.getAs("day"), row.getAs[Double]("temp"))
})
}
对于不起作用的那个,我可以打印count()
和show(10)
并且它包含2190974行,但是当我尝试map
时,它会抛出异常:
17/08/07 11:41:13 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.56.1:58389 (size: 20.6 KB, free: 1987.5 MB)
17/08/07 11:41:13 INFO SparkContext: Created broadcast 2 from csv at Extraction_SparkSQL.scala:74
17/08/07 11:41:13 INFO FileSourceScanExec: Planning scan with bin packing, max size: 5876545 bytes, open cost is considered as scanning 4194304 bytes.
Nothing (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
scala.MatchError: Nothing (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:706)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:385)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:384)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor(ScalaReflection.scala:384)
at org.apache.spark.sql.catalyst.ScalaReflection$.deserializerFor(ScalaReflection.scala:136)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:72)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33)
at observatory.Extraction_SparkSQL$.getValidLocTemp(Extraction_SparkSQL.scala:77)
at observatory.Extraction_SparkSQL$.extraction(Extraction_SparkSQL.scala:90)
at observatory.ExtractionTest$$anonfun$1.apply$mcV$sp(ExtractionTest.scala:13)
at observatory.ExtractionTest$$anonfun$1.apply(ExtractionTest.scala:11)
at observatory.ExtractionTest$$anonfun$1.apply(ExtractionTest.scala:11)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
答案 0 :(得分:2)
这是因为您没有为:
提供类型注释row.getAs("month")
和
row.getAs("day")
因此Scala编译器类型的结果为Nothing
。假设它是String
,这应解决问题:
(modifiedStn + "_" + modifiedWban, row.getAs[String]("month"), row.getAs[String]("day"), row.getAs[Double]("temp"))