我正在尝试将一个Map类型列从spark DF写到Hive Orc表但是它失败并出现错误“Not matching column type”
蜂巢表:
CREATE EXTERNAL TABLE `default.test_map_col`(
test_col Map<String,String>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS ORC
LOCATION '/hdfs/path'
TBLPROPERTIES (
'serialization.null.format'='')
用于创建DF的模式
schema = StructType(Seq(StructField("map_key_values", MapType(StringType, StringType), nullable = true)))
在inputDF中填充的地图列为
("map_key_values", map(lit("testkey"), lit("testval"))
我还尝试使用UDF将DF中的地图列填充为
val toStruct = udf((c1: Map[String, String]) => c1.map {
case (k, v) => k + "\u0003" + v}.toSeq)
关于如何写这个的任何想法?