从CSV文件创建Hive表时的唯一ID

时间:2015-06-02 06:51:50

标签: csv hadoop hive

我有一个CSV文件列表,我想将其导出为Hive表,但我非常确定CSV中的某些记录是多余的。 CSV中的每个记录/行都由一个键标识,我想使用该键作为主键生成表。如何生成Hive表,以便没有重复的行?

1 个答案:

答案 0 :(得分:0)

ROW_NUMBER() OVER([partition_by_clause] order_by_clause)

返回一个以1开头的整数递增序列。

select x, row_number() over(order by x, property) as row_number, property from int_t;
+----+------------+----------+
| x  | row_number | property |
+----+------------+----------+
| 1  | 1          | odd      |
| 1  | 2          | square   |
| 2  | 3          | even     |
| 2  | 4          | prime    |
| 3  | 5          | odd      |
| 3  | 6          | prime    |
| 4  | 7          | even     |
| 4  | 8          | square   |
| 5  | 9          | odd      |
| 5  | 10         | prime    |
| 6  | 11         | even     |
| 6  | 12         | perfect  |
| 7  | 13         | lucky    |
| 7  | 14         | lucky    |
| 7  | 15         | lucky    |
| 7  | 16         | odd      |
| 7  | 17         | prime    |
| 8  | 18         | even     |
| 9  | 19         | odd      |
| 9  | 20         | square   |
| 10 | 21         | even     |
| 10 | 22         | round    |
+----+------------+----------+