我在hive中有一个表格,其结构如下:
+------+------+--------+-------+
| col1 | col2 | col3 | Value |
+------+------+--------+-------+
| 1 | A1 | M1 | 23 |
| 1 | A1 | M1_LW | 25 |
| 1 | A1 | M1_L2W | 22 |
| 1 | A1 | M2 | 17 |
| 1 | A1 | M2_LW | 21 |
| 1 | A1 | M2_L2W | 13 |
| 1 | A1 | M3 | 16 |
| 1 | A1 | M3_LW | 30 |
| 1 | A1 | M3_L2W | 11 |
| 2 | A2 | M1 | 22 |
| 2 | A2 | M1_LW | 22 |
| 2 | A2 | M1_L2W | 10 |
| 2 | A2 | M2 | 14 |
| 2 | A2 | M2_LW | 25 |
| 2 | A2 | M2_L2W | 23 |
| 2 | A2 | M3 | 10 |
| 2 | A2 | M3_LW | 20 |
| 2 | A2 | M3_L2W | 25 |
+------+------+--------+-------+
从查询的角度来看,这种结构足以满足需求,但是对于特定的报告需求,需要将表转换为如下所示:
+------+------+-------+----+----+----+
| col1 | col2 | col3 | M1 | M2 | M3 |
+------+------+-------+----+----+----+
| 1 | A1 | Today | 23 | 17 | 16 |
| 1 | A1 | LW | 25 | 21 | 30 |
| 1 | A1 | L2W | 22 | 13 | 11 |
| 2 | A2 | Today | 22 | 14 | 10 |
| 2 | A2 | LW | 22 | 25 | 20 |
| 2 | A2 | L2W | 10 | 23 | 25 |
+------+------+-------+----+----+----+
您可以使用hive中提供的内置功能来帮助您。我已尝试使用pivot和union的结果,但它会成为一个性能开销。 尝试使用to_map UDAF,但正在使用的hive版本似乎不支持它。 任何意见都将不胜感激。
答案 0 :(得分:0)
您可以使用select col1, col2,
(case when col3 like '%_LW' then 'LW'
when col3 like '%_L2W' then 'L2W'
else 'Today'
end) as col3,
max(case when col3 like 'M1%' then value end) as m1,
max(case when col3 like 'M2%' then value end) as m2,
max(case when col3 like 'M3%' then value end) as m3
from t
group by col1, col2,
(case when col3 like '%_LW' then 'LW'
when col3 like '%_L2W' then 'L2W'
else 'Today'
end)
和条件聚合:
{{1}}