应用错误收集

Hive中的增量更新

时间：2016-05-02 19:29:13

标签： mysql hadoop hive bigdata

我有一个源MySql表。我必须将日期导出到Hive以进行分析。最初，当MySQL中的数据大小较少时，使用Sqoop将Mysql数据导出到Hive不是一个问题。现在随着我的数据量的增长，如何将MySql数据的增量更新添加到hive？

2 个答案:

答案 0 :(得分：0)

您可以使用sqoop进行增量更新，Sqoop文档很好，这里是链接 https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

答案 1 :(得分：0)

这是使用蜂巢/火花进行增量更新的示例。

scala> spark.sql("select * from table1").show +---+---+---------+ | id|sal|timestamp| +---+---+---------+ | 1|100| 30-08| | 2|200| 30-08| | 3|300| 30-08| | 4|400| 30-08| +---+---+---------+

scala> spark.sql("select * from table2").show +---+----+---------+ | id| sal|timestamp| +---+----+---------+ | 2| 300| 31-08| | 4|1000| 31-08| | 5| 500| 31-08| | 6| 600| 31-08| +---+----+---------+

scala> spark.sql("select b.id,b.sal from table1 a full outer join table2 b on a.id = b.id where b.id is not null union select a.id,a.sal from table1 a full outer join table2 b on a.id = b.id where b.id is null").show +---+----+ | id| sal| +---+----+ | 4|1000| | 6| 600| | 2| 300| | 5| 500| | 1| 100| | 3| 300| +---+----+

希望这种逻辑对您有用。