如何更新Hive 0.13中的表格?

时间:2017-06-26 05:25:58

标签: hadoop hive

我的Hive版本是0.13。我有两个表,table_1table_2

table_1包含:

customer_id | items | price | updated_date
------------+-------+-------+-------------
10          | watch | 1000  | 20170626
11          | bat   | 400   | 20170625

table_2包含:

customer_id | items    | price | updated_date
------------+----------+-------+-------------
10          | computer | 20000 | 20170624

如果table_2中已存在customer_id,我想更新table_2的记录,如果不存在,则应附加到namespace TimePicker.Models { public class Hora { [Required] public string Name{ get; set; } [Required] [DataType(DataType.Time,ErrorMessage="Your Message")] [RegularExpression(@"^(0[1-9]|1[0-2]):[0-5][0-9] (am|pm|AM|PM)$", ErrorMessage = "Invalid Time.")] public TimeSpan Hour { get; set; } } }

由于Hive 0.13不支持更新,我尝试使用join,但它失败了。

1 个答案:

答案 0 :(得分:3)

您可以使用row_numberfull join。这是使用row_number

的示例
insert overwrite table_1 
select customer_id, items, price, updated_date
from
(
select customer_id, items, price, updated_date,
       row_number() over(partition by customer_id order by new_flag desc) rn
from 
    (
     select customer_id, items, price, updated_date, 0 as new_flag
       from table_1
     union all
     select customer_id, items, price, updated_date, 1 as new_flag
       from table_2
    ) all_data
)s where rn=1;

另请参阅此答案,以使用FULL JOINhttps://stackoverflow.com/a/37744071/2700344

进行更新