例如,输入数据:
1.0
\N
架构:
val schema = StructType(Seq(
StructField("value", DoubleType, false)
))
阅读Spark数据集:
val df = spark.read.schema(schema)
.csv("/path to csv file ")
当我使用这个数据集时,我会得到一个例外,因为“\ N”对于double是无效的。如何在此数据集中将“\ N”替换为0.0?感谢。
答案 0 :(得分:0)
If data is malformed, don't use schema with inappropriate type. Define input as StringType
:
val schema = StructType(Seq(
StructField("value", StringType, false)
))
and cast data later:
val df = spark.read.schema(schema).csv("/path/to/csv/file")
.withColumn("value", $"value".cast("double"))
.na.fill(0.0)