我正在尝试使用--incremental lastmodified运行sqoop import,以获取更新和插入的记录。以下是表格:
mysql> describe inc_lastmod_test;
+-------------------+-------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | 0 | |
| value | varchar(20) | YES | | NULL | |
| last_updated_date | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------------+-------------+------+-----+-------------------+-----------------------------+
表中的数据集是:
mysql> select * from inc_lastmod_test;
+----+----------+---------------------+
| id | value | last_updated_date |
+----+----------+---------------------+
| 1 | first | 2016-01-01 00:00:00 |
| 2 | second | 2016-06-04 06:56:49 |
| 3 | newthird | 2016-06-04 07:06:40 |
| 4 | fourth | 2016-01-04 00:00:00 |
| 5 | fifth | 2016-01-05 00:00:00 |
| 6 | sixsth | 2016-01-06 00:00:00 |
| 7 | seventh | 2016-01-07 00:00:00 |
| 8 | eighth | 2016-06-04 07:04:58 |
+----+----------+---------------------+
我之前将表导入HDFS,处于以下状态:
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/inc_lastmod_test/part*
1,first,2016-01-01 00:00:00.0
2,second,2016-06-04 06:56:49.0
3,third,2016-01-03 00:00:00.0
4,fourth,2016-01-04 00:00:00.0
5,fifth,2016-01-05 00:00:00.0
6,sixsth,2016-01-06 00:00:00.0
7,seventh,2016-01-07 00:00:00.0
根据mysql表中的当前数据状态,我希望有 1)id = 8的新记录 2)id = 3的更新记录(更改值) 在HDFS目录
但是,在运行以下命令后,似乎只插入了所有记录而没有更新现有记录:
[cloudera@quickstart ~]$ sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00"
请告诉我哪里出错了?
答案 0 :(得分:0)
试试这个:
[cloudera@quickstart ~]$ sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00.0"
答案 1 :(得分:0)
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table inc_lastmod_test \
--append \
--incremental lastmodified \
--check-column last_updated_date \
--last-value "2016-01-08 00:00:00
增量更新不会修改现有文件(/ user / cloudera / inc_lastmod_test / part *)。它获取在给定日期之后和之后修改的所有记录,并将它们写入新文件。