Question

例如，输入数据：

1.0
\N

架构：

val schema = StructType(Seq(
  StructField("value", DoubleType, false)
))

阅读Spark数据集：

val df = spark.read.schema(schema)
    .csv("/path to csv file ")

当我使用这个数据集时，我会得到一个例外，因为“\ N”对于double是无效的。如何在此数据集中将“\ N”替换为0.0？感谢。

Answer 1

If data is malformed, don't use schema with inappropriate type. Define input as StringType:

val schema = StructType(Seq(
 StructField("value", StringType, false)
))

and cast data later:

val df = spark.read.schema(schema).csv("/path/to/csv/file")
  .withColumn("value", $"value".cast("double"))
  .na.fill(0.0)