我们以前使用的是Spark 2.3,现在使用的是2.4:
Spark version 2.4.0
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
我们在生产中运行了一段代码,将csv文件转换为拼花格式。 我们已设置csv加载的选项之一是option(“ nullValue”,null)。它在Spark 2.4中的工作方式有问题。
这是显示问题的示例。
C0,C1,C2,C3,C4,C5
1,"1234",0.00,"","D",0.00
2,"",0.00,"","D",0.00
scala> val data1 = spark.read.option("header", "true").option("inferSchema", "true").option("treatEmptyValuesAsNulls","true").option("nullValue", null).csv("file:///tmp/test.csv")
we get an empty row:
scala> data1.show
+----+----+----+----+----+----+
| C0| C1| C2| C3| C4| C5|
+----+----+----+----+----+----+
| 1|1234| 0.0| | D| 0.0|
|null|null|null|null|null|null|
+----+----+----+----+----+----+
C0,C1,C2,C3,C4,C5
1,"1234",0.00,"","D",0.00
2,"",0.00,"1","D",0.00
结果更糟:
scala> val data2 = spark.read.option("header", "true").option("inferSchema", "true").option("treatEmptyValuesAsNulls","true").option("nullValue", null).csv("file:///tmp/test.csv")
scala> data2.show
+----+----+----+----+----+----+
| C0| C1| C2| C3| C4| C5|
+----+----+----+----+----+----+
|null|null|null|null|null|null|
|null|null|null|null|null|null|
+----+----+----+----+----+----+
这是Spark 2.4.0新版本中的错误吗?任何机构都面临类似的问题?
答案 0 :(得分:1)
火花选项 emptyValue 已解决问题
task-executor