我想将JSON保存到MySQL中的表中。
经过一些阅读后,我发现将数据加载到mysql的路径是json-> dataframe-> mysql。
{"name":"Johny","hobbies":["swiming","cooking"]}
{"name":"James","hobbies":["baseketball","fishing"]}
{"name":"Tom","hobbies":["singing","football"]}
我使用以下命令读取json文件:
val df = sqlContext.read.json("test.json")
df.show()
df.printSchema()
并输出:
+--------------------+-----+
| hobbies| name|
+--------------------+-----+
| [swiming, cooking]|Johny|
|[baseketball, fis...|James|
| [singing, football]| Tom|
+--------------------+-----+
root
|-- hobbies: array (nullable = true)
| |-- element: string (containsNull = true)
|-- name: string (nullable = true)
使用以下命令时:
df.registerTempTable("mytable")
sqlContext.
sql("SELECT * FROM mytable").
write.
mode(SaveMode.Append).
jdbc(url,"jsontest",prop)
我收到以下错误:
java.lang.IllegalArgumentException:无法获取数组
的JDBC类型
如何在DataFrame中将字符串数组转换为swiming, cooking
之类的字符串?
答案 0 :(得分:3)
使用简单的array
string
string
转换为udf
import org.apache.spark.sql.functions._
val value = udf((arr: Seq[String]) => arr.mkString(","))
val newDf = df.withColumn("hobbies", value($"hobbies"))
或者您也可以像Jacek所说的那样使用concat_ws函数
df.withColumn("hobbies", concat_ws(col("hobbies")))
输出:
+--------------------+-----+
| hobbies| name|
+--------------------+-----+
| swiming, cooking |Johny|
|baseketball, fishing|James|
| singing, football | Tom|
+--------------------+-----+
然后将newDF
保存为
newDF.write.mode(SaveMode.Append).jdbc(url,"jsontest",prop)
答案 1 :(得分:2)
如何在DataFrame中将字符串数组转换为
swiming, cooking
之类的字符串?
您应该使用内置的concat_ws功能。
concat_ws(sep:String,exprs:Column *):Column 使用给定的分隔符将多个输入字符串列连接成一个字符串列。
解决方案如下:
val hobbies = Seq(
(Array("swiming","cooking"), "Johny"),
(Array("baseketball","fishing"), "James"),
(Array("singing","football"), "Tom")
).toDF("hobbies", "name")
val solution = hobbies.select(concat_ws(",", $"hobbies") as "hobbies", $"name")
scala> solution.show
+-------------------+-----+
| hobbies| name|
+-------------------+-----+
| swiming,cooking|Johny|
|baseketball,fishing|James|
| singing,football| Tom|
+-------------------+-----+