我正在尝试在Spark中将字符串数组转换为字节数组,然后将字节数组重新转换为字符串数组。
但是,我没有按预期返回String数组。这是代码-
// UDFs for converting Array[String] to byte array and get back Array[String] from byte array
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.databind.ObjectMapper
val mapper: ObjectMapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
val convertToByteArray = udf((map: Seq[String]) => mapper.writeValueAsBytes(map))
val convertToString = udf((a: Array[Byte])=> new String(a))
val arrayDF = Seq(
("x100", Array("p1","p2","p3","p4"))
).toDF("id", "myarray")
arrayDF.printSchema()
root
|-- id: string (nullable = true)
|-- myarray: array (nullable = true)
| |-- element: string (containsNull = true)
arrayDF.show(false)
+----+----------------+
|id |myarray |
+----+----------------+
|x100|[p1, p2, p3, p4]|
+----+----------------+
val converted = arrayDF.withColumn("bytearray", convertToByteArray($"myarray")).select($"id",$"bytearray")
converted.printSchema()
root
|-- id: string (nullable = true)
|-- bytearray: binary (nullable = true)
converted.show(false)
+----+----------------------------------------------------------------+
|id |bytearray |
+----+----------------------------------------------------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|
+----+----------------------------------------------------------------+
val getBack = converted.withColumn("getstring", convertToString($"bytearray"))
getBack.printSchema()
root
|-- id: string (nullable = true)
|-- bytearray: binary (nullable = true)
|-- getstring: string (nullable = true)
getBack.show(false)
+----+----------------------------------------------------------------+---------------------+
|id |bytearray |getstring |
+----+----------------------------------------------------------------+---------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|["p1","p2","p3","p4"]|
+----+----------------------------------------------------------------+---------------------+
但是,我希望我的最终结果为-
+----+----------------------------------------------------------------+---------------------+
|id |bytearray |getstring |
+----+----------------------------------------------------------------+---------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|[p1,p2,p3,p4]|
+----+----------------------------------------------------------------+---------------------+
这是我用来创建字节数组的pom.xml
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.9.5</version>
</dependency>
答案 0 :(得分:0)
获取一个字符串列表并将其视为单个对象,然后在转换时将其视为只是一个字符串-如果要返回单个字符串,则还需要将列表转换为字符串:
val convertToByteArray = udf((map: Seq[String]) => mapper.writeValueAsBytes(map.mkString("[",",","]")))