有没有办法避免火花在每列开头出现反斜杠?

时间:2019-08-26 20:57:52

标签: apache-spark apache-spark-sql

我有一个column,其中的windows address如下:

  

\ aod140med01MediaExtractorCatalog20190820Hub26727007444841620183_6727007462021489387.nmf

当我尝试读取dataset时,将其读取到column后,它将转义第一个反斜杠并按如下所示打印该值。有没有办法跳过这个?

  

aod140med01MediaExtractorCatalog20190820Hub26727007444841620183_6727007462021489387.nmf

1 个答案:

答案 0 :(得分:0)

默认情况下,Apache Spark无法消除后退污渍

val df1 = sc.parallelize(
     | Seq(
     |   (1,"khan /, vaquar","30","/aod140med01MediaExtractorCatalog20190820Hub26727007444841620183_6727007462021489387.nmf"),
     |   (2,"Zidan /, khan","5","vkhan1MediaExtractorCatalog20190820Hub26727007444841620183_6727007462021489387.nmf"),
     |   (3,"Zerina khan","1","test")
     |   ) ).toDF("id","name","age","string").show

enter image description here

请分享您的完整代码以进一步调试问题。