对于配置单元上的SparkSQL,当我在查询中使用named_struct
时,它将返回结果:
SELECT id, collect_set(emp_info) as employee_info
FROM
(
SELECT t.id, named_struct("name", t.emp_name, "dept", t.emp_dept) AS emp_info
FROM mytable t
) a
GROUP BY id
但是当我将named_struct
替换为map
时,出现错误消息:
SELECT id, collect_set(emp_info) as employee_info
FROM
(
SELECT t.id, map("name", t.emp_name, "dept", t.emp_dept) AS emp_info
FROM mytable t
) a
GROUP BY id
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'collect_set(a.`emp_info`)' due to data type mismatch: collect_set() cannot have map type data; line 36 pos 27;
'Distinct
我希望返回name
和dept
的地图,如何与collect_set
一起使用?
仅供参考:这个带有map
的查询在Hive(Hue)中运行没有问题