Question

简而言之，我正在利用spark-xml来解析XML文件。但是，使用它会删除我感兴趣的所有值中的前导零。但是，我需要最终输出，即DataFrame，以包含前导零。我不确定/无法找到将前导零添加到我感兴趣的列的方法。

screen.orientation.lock('landscape');

我收到的示例输出

val df = spark.read
  .format("com.databricks.spark.xml")
  .option("rowTag", "output")
  .option("excludeAttribute", true)
  .option("allowNumericLeadingZeros", true) //including this does not solve the problem
  .load("pathToXmlFile")

期望的输出

+------+---+--------------------+
|iD    |val|Code                |
+------+---+--------------------+
|1     |44 |9022070536692784476 |
|2     |66 |-5138930048185086175|
|3     |25 |805582856291361761  |
|4     |17 |-9107885086776983000|
|5     |18 |1993794295881733178 |
|6     |31 |-2867434050463300064|
|7     |88 |-4692317993930338046|
|8     |44 |-4039776869915039812|
|9     |20 |-5786627276152563542|
|10    |12 |7614363703260494022 |
+------+---+--------------------+

Answer 1

这解决了我的问题，谢谢大家的帮助

 val df2 = df
        .withColumn("idLong", format_string("%03d", $"iD"))

Answer 2

您只需使用concat内置函数

即可

df.withColumn("iD", concat(lit("00"), col("iD")))
           .withColumn("val", concat(lit("0"), col("val")))

将前导零添加到Spark数据框中的列

2 个答案: