尝试将CSV中的所有\字符替换为\,以便正确读取它们。这是我的udf:
def escapeBackslash: String => String = _.replaceAll("[_\\]","\\\\")
def escapeBackslashUDF = udf(escapeBackslash)
\ N很好,所以我不需要担心这些,但我得到了输出:
123,myName\,myDesc,245,true
345,anotherName,\N,600,\N
789,name3,desc3,\N,false
有任何帮助吗?
这是输入:
val schema = StructType(Seq(StructField("id", StringType), StructField("name", StringType), StructField("dec", StringType), StructField("amount",IntegerType), StructField("enabled",BooleanType)))
val rdd = spark.sparkContext.parallelize(Seq(Row("123", "myName\\", "myDesc",245, true), Row("345","anotherName",null,600,null), Row("789","name3","desc3",null,false)))
我试图获取以下输出以逃避:
123,myName\\,myDesc,245,true
345,anotherName,\N,600,\N
789,name3,desc3,\N,false
答案 0 :(得分:0)
我所做的是为每个字符串字段编写一个UDF,注册UDF,然后对每个字段进行排序以查找字符串并使用UDF选择这些字段。
val func:(String) => String = (test:String) => {
Option(test) match{
case Some(t) => t.replace("\\", "\\\\")
case _ => <null>
}
}
val fields = dfOut.schema.fields
.map( col => {
if (col.dataType.isInstanceOf[StringType]){ s"${EscapeBackslashUDF.name}(${col.name})"}
else { s"${col.name}" }
}).toSeq
super.write(target, dfOut.selectExpr(fields:_*) )