如何将字符串数组转换为字符串列?

时间:2017-06-28 02:17:15

标签: json scala apache-spark dataframe apache-spark-sql

我想将JSON保存到MySQL中的表中。

经过一些阅读后,我发现将数据加载到mysql的路径是json-> dataframe-> mysql。

{"name":"Johny","hobbies":["swiming","cooking"]}
{"name":"James","hobbies":["baseketball","fishing"]}
{"name":"Tom","hobbies":["singing","football"]}

我使用以下命令读取json文件:

val df = sqlContext.read.json("test.json")
df.show()
df.printSchema()

并输出:

+--------------------+-----+                                                    
|             hobbies| name|
+--------------------+-----+
|  [swiming, cooking]|Johny|
|[baseketball, fis...|James|
| [singing, football]|  Tom|
+--------------------+-----+

root
 |-- hobbies: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- name: string (nullable = true)

使用以下命令时:

df.registerTempTable("mytable")
sqlContext.
  sql("SELECT * FROM mytable").
  write.
  mode(SaveMode.Append).
  jdbc(url,"jsontest",prop)

我收到以下错误:

  

java.lang.IllegalArgumentException:无法获取数组

的JDBC类型

如何在DataFrame中将字符串数组转换为swiming, cooking之类的字符串?

2 个答案:

答案 0 :(得分:3)

使用简单的array

string string转换为udf
 import org.apache.spark.sql.functions._

 val value = udf((arr: Seq[String]) => arr.mkString(","))

 val newDf = df.withColumn("hobbies", value($"hobbies"))

或者您也可以像Jacek所说的那样使用concat_ws函数

df.withColumn("hobbies", concat_ws(col("hobbies")))

输出:

+--------------------+-----+                                                    
|             hobbies| name|
+--------------------+-----+
|  swiming, cooking  |Johny|
|baseketball, fishing|James|
| singing, football  |  Tom|
+--------------------+-----+

然后将newDF保存为

newDF.write.mode(SaveMode.Append).jdbc(url,"jsontest",prop)

答案 1 :(得分:2)

  

如何在DataFrame中将字符串数组转换为swiming, cooking之类的字符串?

您应该使用内置的concat_ws功能。

  

concat_ws(sep:String,exprs:Column *):Column 使用给定的分隔符将多个输入字符串列连接成一个字符串列。

解决方案如下:

val hobbies = Seq(
  (Array("swiming","cooking"), "Johny"),
  (Array("baseketball","fishing"), "James"),
  (Array("singing","football"), "Tom")
).toDF("hobbies", "name")

val solution = hobbies.select(concat_ws(",", $"hobbies") as "hobbies", $"name")
scala> solution.show
+-------------------+-----+
|            hobbies| name|
+-------------------+-----+
|    swiming,cooking|Johny|
|baseketball,fishing|James|
|   singing,football|  Tom|
+-------------------+-----+