我有一个CSV文件列表,我想将其导出为Hive表,但我非常确定CSV中的某些记录是多余的。 CSV中的每个记录/行都由一个键标识,我想使用该键作为主键生成表。如何生成Hive表,以便没有重复的行?
答案 0 :(得分:0)
ROW_NUMBER() OVER([partition_by_clause] order_by_clause)
返回一个以1开头的整数递增序列。
select x, row_number() over(order by x, property) as row_number, property from int_t;
+----+------------+----------+
| x | row_number | property |
+----+------------+----------+
| 1 | 1 | odd |
| 1 | 2 | square |
| 2 | 3 | even |
| 2 | 4 | prime |
| 3 | 5 | odd |
| 3 | 6 | prime |
| 4 | 7 | even |
| 4 | 8 | square |
| 5 | 9 | odd |
| 5 | 10 | prime |
| 6 | 11 | even |
| 6 | 12 | perfect |
| 7 | 13 | lucky |
| 7 | 14 | lucky |
| 7 | 15 | lucky |
| 7 | 16 | odd |
| 7 | 17 | prime |
| 8 | 18 | even |
| 9 | 19 | odd |
| 9 | 20 | square |
| 10 | 21 | even |
| 10 | 22 | round |
+----+------------+----------+