Sqoop增量最后修改不更新

时间:2016-06-04 16:00:30

标签: sqoop

我正在尝试使用--incremental lastmodified运行sqoop import,以获取更新和插入的记录。以下是表格:

mysql> describe inc_lastmod_test;
+-------------------+-------------+------+-----+-------------------+-----------------------------+
| Field             | Type        | Null | Key | Default           | Extra                       |
+-------------------+-------------+------+-----+-------------------+-----------------------------+
| id                | int(11)     | NO   | PRI | 0                 |                             |
| value             | varchar(20) | YES  |     | NULL              |                             |
| last_updated_date | timestamp   | NO   | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------------+-------------+------+-----+-------------------+-----------------------------+

表中的数据集是:

mysql> select * from inc_lastmod_test;
+----+----------+---------------------+
| id | value    | last_updated_date   |
+----+----------+---------------------+
|  1 | first    | 2016-01-01 00:00:00 |
|  2 | second   | 2016-06-04 06:56:49 |
|  3 | newthird | 2016-06-04 07:06:40 |
|  4 | fourth   | 2016-01-04 00:00:00 |
|  5 | fifth    | 2016-01-05 00:00:00 |
|  6 | sixsth   | 2016-01-06 00:00:00 |
|  7 | seventh  | 2016-01-07 00:00:00 |
|  8 | eighth   | 2016-06-04 07:04:58 |
+----+----------+---------------------+

我之前将表导入HDFS,处于以下状态:

[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/inc_lastmod_test/part*
1,first,2016-01-01 00:00:00.0
2,second,2016-06-04 06:56:49.0
3,third,2016-01-03 00:00:00.0
4,fourth,2016-01-04 00:00:00.0
5,fifth,2016-01-05 00:00:00.0
6,sixsth,2016-01-06 00:00:00.0
7,seventh,2016-01-07 00:00:00.0

根据mysql表中的当前数据状态,我希望有 1)id = 8的新记录 2)id = 3的更新记录(更改值) 在HDFS目录

但是,在运行以下命令后,似乎只插入了所有记录而没有更新现有记录:

    [cloudera@quickstart ~]$ sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00"

请告诉我哪里出错了?

2 个答案:

答案 0 :(得分:0)

试试这个:

   [cloudera@quickstart ~]$ sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00.0"

答案 1 :(得分:0)

  sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00

增量更新不会修改现有文件(/ user / cloudera / inc_lastmod_test / part *)。它获取在给定日期之后和之后修改的所有记录,并将它们写入新文件。