有人可以帮我解决sqoop incremental中append和lastmodified模式之间的确切区别吗?
当使用追加模式(使用--check-column作为时间戳)可以完成相同的操作时,需要lastmodified的什么。它的工作方式相同,并导入更新和插入的记录。
答案 0 :(得分:1)
模式:附加
mysql> describe emp;
+--------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(100) | YES | | NULL | |
| deg | varchar(100) | YES | | NULL | |
| salary | int(11) | YES | | NULL | |
| dep | varchar(10) | YES | | NULL | |
+--------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
使用sqoop命令导入数据。
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
-m 3 \
--table emp \
--split-by id \
--columns id,name,deg \
--warehouse-dir /user/sqoop/ \
--delete-target-dir \
--as-textfile
HDFS输出
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
drwxr-xr-x - cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
mysql> select * from emp;
+------+---------+--------------+--------+------+
| id | name | deg | salary | dep |
+------+---------+--------------+--------+------+
| 1201 | gopal | manager | 50000 | tp |
| 1202 | manisha | Proof reader | 50000 | TP |
| 1203 | php dev | TECH WRITER | 50000 | AC |
| 1204 | Nilesh | Domino dev | 70000 | AF |
| 1205 | Vinayak | Java dev | 50000 | IT |
| 1206 | Amish | Cog dev | 60000 | IT |
| 1207 | Jatin | Oracel dev | 40001 | IT |
| 1208 | Viren | Java dev | 70004 | IT |
| 1209 | Ashish | Oracel dev | 40001 | IT |
| 1210 | Satish | Java dev | 70004 | IT |
+------+---------+--------------+--------+------+
10 rows in set (0.00 sec)
在表格中插入新记录。
mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
Query OK, 1 row affected (0.03 sec)
mysql> select * from emp;
+------+---------+--------------+--------+------+
| id | name | deg | salary | dep |
+------+---------+--------------+--------+------+
| 1201 | gopal | manager | 50000 | tp |
| 1202 | manisha | Proof reader | 50000 | TP |
| 1203 | php dev | TECH WRITER | 50000 | AC |
| 1204 | Nilesh | Domino dev | 70000 | AF |
| 1205 | Vinayak | Java dev | 50000 | IT |
| 1206 | Amish | Cog dev | 60000 | IT |
| 1207 | Jatin | Oracel dev | 40001 | IT |
| 1208 | Viren | Java dev | 70004 | IT |
| 1209 | Ashish | Oracel dev | 40001 | IT |
| 1210 | Satish | Java dev | 70004 | IT |
| 1211 | Jag | be | 20000 | IT |
+------+---------+--------------+--------+------+
11 rows in set (0.00 sec)
增量导入命令
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table emp \
--split-by id \
--check-column id \
--incremental append \
--last-value 1210 \
--warehouse-dir /user/sqoop/ \
--as-textfile
导入后
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
-rw-r--r-- 1 cloudera cloudera 21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
[cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
1211,Jag,be,20000,IT
模式:lastmodified
mysql> describe orders;
+-------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+----------------+
| order_id | int(11) | NO | PRI | NULL | auto_increment |
| order_date | datetime | NO | | NULL | |
| order_customer_id | int(11) | NO | | NULL | |
| order_status | varchar(45) | NO | | NULL | |
+-------------------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
将订单导入hdfs
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id \
--target-dir /user/sqoop/orders \
--as-textfile
导入后
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
-rw-r--r-- 1 cloudera supergroup 753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
-rw-r--r-- 1 cloudera supergroup 752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
-rw-r--r-- 1 cloudera supergroup 752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
更新订单数据
mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+-----------------+
| order_id | order_date | order_customer_id | order_status |
+----------+---------------------+-------------------+-----------------+
| 10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |
+----------+---------------------+-------------------+-----------------+
1 row in set (0.00 sec)
mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+--------------+
| order_id | order_date | order_customer_id | order_status |
+----------+---------------------+-------------------+--------------+
| 10 | 2017-12-02 16:19:23 | 5648 | CLOSED |
+----------+---------------------+-------------------+--------------+
1 row in set (0.00 sec)
导入其他数据
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id \
--check-column order_date \
--merge-key order_id \
--incremental lastmodified \
--target-dir /user/sqoop/orders1 \
--as-textfile
<强>输出强>
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
-rw-r--r-- 1 cloudera cloudera 0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
-rw-r--r-- 1 cloudera cloudera 2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
注意:如果我们使用与之前相同的目录(订单),则会删除旧文件并创建新部件。