Sqoop增量导入'lastmodified'忽略了last-value参数

时间:2019-05-17 15:45:45

标签: sqoop cloudera-quickstart-vm

我只想从mysql表中导入新添加的行(id> 110,created_date> 2019-05-17 08:07:13)

+---------------+--------------------+---------------------+
| department_id | department_name    | created_date        |
+---------------+--------------------+---------------------+
|             2 | Fitness            | 2019-05-17 08:07:13 |
|             3 | Footwear           | 2019-05-17 08:07:13 |
|             4 | Apparel            | 2019-05-17 08:07:13 |
|             5 | Golf               | 2019-05-17 08:07:13 |
|           ... |        ...         |         ...         |
|            23 | Science            | 2019-05-17 08:07:13 |
|            24 | Engineering        | 2019-05-17 08:07:13 |
|           110 | Civil              | 2019-05-17 08:10:00 | <<-+- new records
|           111 | Mechanical         | 2019-05-17 08:10:00 |    |
|           112 | Automobile         | 2019-05-17 08:10:00 |    |
|           113 | Pharma             | 2019-05-17 08:10:00 |    |
|           114 | Social Engineering | 2019-05-17 08:10:01 | <<-+
+---------------+--------------------+---------------------+

但是使用lastmodified模式的sqoop增量导入将导入所有记录。

sqoop import 
--connect ...  
--table departments_new 
--target-dir /user/hive/warehouse/tmp9.db/ 
-m 1 
--append 
--incremental lastmodified 
--check-column created_date 
--last-value '2019-05-17 08:07:13' 
--split-by 'department_id';

我期望此命令仅导入受created_date限制的记录。但是结果看起来像这样。

2,Fitness,2019-05-17 08:07:13.0         <<-+
...                                        |
22,Maths,2019-05-17 08:07:13.0             |
23,Science,2019-05-17 08:07:13.0           |
24,Engineering,2019-05-17 08:07:13.0    <<-+- this should not be here
110,Civil,2019-05-17 08:10:00.0
111,Mechanical,2019-05-17 08:10:00.0
112,Automobile,2019-05-17 08:10:00.0
113,Pharma,2019-05-17 08:10:00.0
114,Social Engineering,2019-05-17 08:10:01.0

我做错了什么?

0 个答案:

没有答案