应用错误收集

我正在尝试使用databrics库处理XML文件。在这种情况下，有一些特殊字符，如‘。写入csv文件后，文本看起来像â€˜。为此我尝试了以下方法：

使用翻译：

df.select($"column",translate($"column","T","A").as("new_column")).show()

使用正则表达式：

df.withColumn("column", concat_ws(",",$"column".cast(sql.types.StringType)))
  .select($"column",regexp_replace($"column","&#x2018;","AP").as("column"))

在上述两种情况下，我没有得到正确的输出。它再次返回â€˜。

有没有办法让文字原样？

input : Nectarine tree named &#x2018;Polar Zee&#x2019;

current Output: Nectarine tree named â€˜Polar Zeeâ€™

expected Output : Nectarine tree named &#x2018;Polar Zee&#x2019;

替换Spark Dataframe中的文本

0 个答案: