我正在将Scala和Apache Spark 2.3.0与CSV文件一起使用。之所以这样做,是因为当我尝试将csv用于k时,它表示我具有空值,但是即使我尝试填充这些空值,它仍然会出现相同的问题
scala>val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter",";")
.schema(schema).load("33.csv")
scala> df.na.fill(df.columns.zip(
df.select(df.columns.map(mean(_)): _*).first.toSeq
).toMap)
scala> val featuresCols = Array("LONGITUD","LATITUD")
featuresCols: Array[String] = Array(LONGITUD, LATITUD)
scala> val featureCols = Array("LONGITUD","LATITUD")
featureCols: Array[String] = Array(LONGITUD, LATITUD)
scala> val assembler = new VectorAssembler().setInputCols(featureCols).setOutputCol("features")
assembler: org.apache.spark.ml.feature.VectorAssembler = vecAssembler_440117601217
scala> val df2 = assembler.transform(df)
df2: org.apache.spark.sql.DataFrame = [ID_CALLE: int, TIPO: int ... 6 more fields]
scala> df2.show
Caused by: org.apache.spark.SparkException: Values to assemble cannot be null
答案 0 :(得分:1)
看起来就像您做了na.fill(),但没有将其分配给DataFrame。
尝试val nonullDF = df.na.fill(...)