Question

我有一个包含小时数据的表格。我想找到一个小时数以及数组中所有小时的col1和col2的值。输入表

+-----+-----+-----+
| hour| col1| col2|
+-----+-----+-----+
| 00  | 0.0 | a   |
| 04  | 0.1 | b   |
| 08  | 0.2 | c   |
| 12  | 0.0 | d   |
+-----+-----+-----+

我使用以下查询来获取数组中的列值

查询： select count（hr），map_values（str_to_map（concat_ws（＆＃39;，＆＃39;，collect_set（concat_ws）＆＃39;：＆＃39;，reflect（＆＃39; java.util.UUID＆＃39; ，＆＃39; randomUUID＆＃39;），cast（col1 as string））））））as col1_arr，map_values（str_to_map（concat_ws（＆＃39;，＆＃39;，collect_set（concat_ws（＆＃39;：＆＃39;，反映（＆＃39; java.util.UUID＆＃39;，＆＃39; randomUUID＆＃39;），cast（col2 as string））））））作为表中的col2_arr;

我得到的输出，col2_arr中的值与col1_arr的顺序不同。请建议如何以相同的顺序获取不同列的数组/列表中的值。

+----------+-----------------+----------+
| count(hr)| col1_arr        | col2_arr | 
+----------+-----------------+----------+
| 4        | 0.0,0.1,0.2,0.0 | b,a,c,d  | 
+----------+----------------+-----------+

Required output:

+----------+-----------------+----------+
| count(hr)| col1_arr        | col2_arr | 
+----------+-----------------+----------+
| 4        | 0.0,0.1,0.2,0.0 | a,b,c,d  | 
+----------+----------------+-----------+

由于

Answer 1

select  count(*) as cnt 
       ,concat_ws(',',sort_array(collect_list(hour)))  as hour
       ,regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',hour,cast(col1 as string))))),'..:','') as col1
       ,regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',hour,col2)))),'..:','') as col2

from    mytable
;

+-----+-------------+-------------+---------+
| cnt |    hour     |    col1     |  col2   |
+-----+-------------+-------------+---------+
|   4 | 00,04,08,12 | 0,0.1,0.2,0 | a,b,c,d |
+-----+-------------+-------------+---------+

Hive - 数组中相同的记录序列

1 个答案: