以下是表格详情
Id Data
a {"col1":"11.0","col2":30.0}
a {"col1":"12.0","col2":10.0}
b {"col1":"11.0","col2":20.0}
b {"col1":"12.0","col2":25.0}
b {"col1":"15.0","col2":25.0}
c {"col1":"12.0","col2":15.0}
c {"col1":"13.0","col2":16.0}
预期输出 - 按ID分组的数据结构列表。
ID Data
a list[ {"col1":"11.0","col2":30.0},{"col1":"12.0","col2":10.0}]
b list[ {"col1":"11.0","col2":20.0},{"col1":"12.0","col2":25.0},{"col1":"15.0","col2":25.0}]
c list[ {"col1":"12.0","col2":15.0},{"col1":"13.0","col2":16.0}]
是否有可能由HIVE支持的功能或需要编写任何用户定义功能。
答案 0 :(得分:0)
简短的回答是肯定的,之前有答案,请看这里
How to get array/bag of elements from Hive group by operator?
但总结如果你只有你自己的独特元素,那么使用collect_set,否则使用collect_list(仅适用于hive 0.13+),除了它是一个标准的查询组。