Spark DF到Hive ORC表 - 映射类型列

时间:2017-11-08 13:22:00

标签: scala apache-spark spark-dataframe

我正在尝试将一个Map类型列从spark DF写到Hive Orc表但是它失败并出现错误“Not matching column type”

蜂巢表:

CREATE EXTERNAL TABLE `default.test_map_col`(
test_col Map<String,String>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS ORC
LOCATION '/hdfs/path'
TBLPROPERTIES (
'serialization.null.format'='')

用于创建DF的模式

schema = StructType(Seq(StructField("map_key_values", MapType(StringType, StringType), nullable = true)))

在inputDF中填充的地图列为

("map_key_values", map(lit("testkey"), lit("testval"))

我还尝试使用UDF将DF中的地图列填充为

val toStruct = udf((c1: Map[String, String]) => c1.map {
case (k, v) => k + "\u0003" + v}.toSeq)

关于如何写这个的任何想法?

0 个答案:

没有答案