带时间戳列

时间:2017-11-21 07:17:47

标签: increment sqoop

有人可以帮我解决sqoop incremental中append和lastmodified模式之间的确切区别吗?

当使用追加模式(使用--check-column作为时间戳)可以完成相同的操作时,需要lastmodified的什么。它的工作方式相同,并导入更新和插入的记录。

1 个答案:

答案 0 :(得分:1)

  1. 模式:append可用于您知道最后一个值的列。
  2. 模式:lastmodified模式可用于timestamp列。可能很难记住上次修改的时间戳。如果您知道上次修改的时间戳,您可以采用第一种方法。
  3. 模式:附加

    mysql> describe emp;
    +--------+--------------+------+-----+---------+-------+
    | Field  | Type         | Null | Key | Default | Extra |
    +--------+--------------+------+-----+---------+-------+
    | id     | int(11)      | YES  |     | NULL    |       |
    | name   | varchar(100) | YES  |     | NULL    |       |
    | deg    | varchar(100) | YES  |     | NULL    |       |
    | salary | int(11)      | YES  |     | NULL    |       |
    | dep    | varchar(10)  | YES  |     | NULL    |       |
    +--------+--------------+------+-----+---------+-------+
    5 rows in set (0.00 sec)
    

    使用sqoop命令导入数据。

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    -m 3 \
    --table emp \
    --split-by id \
    --columns id,name,deg \
    --warehouse-dir /user/sqoop/ \
    --delete-target-dir \
    --as-textfile
    

    HDFS输出

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
    drwxr-xr-x   - cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp
    -rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
    -rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
    -rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
    -rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
    
    
    mysql> select * from emp;
    +------+---------+--------------+--------+------+
    | id   | name    | deg          | salary | dep  |
    +------+---------+--------------+--------+------+
    | 1201 | gopal   | manager      |  50000 | tp   |
    | 1202 | manisha | Proof reader |  50000 | TP   |
    | 1203 | php dev | TECH WRITER  |  50000 | AC   |
    | 1204 | Nilesh  | Domino dev   |  70000 | AF   |
    | 1205 | Vinayak | Java dev     |  50000 | IT   |
    | 1206 | Amish   | Cog dev      |  60000 | IT   |
    | 1207 | Jatin   | Oracel dev   |  40001 | IT   |
    | 1208 | Viren   | Java dev     |  70004 | IT   |
    | 1209 | Ashish  | Oracel dev   |  40001 | IT   |
    | 1210 | Satish  | Java dev     |  70004 | IT   |
    +------+---------+--------------+--------+------+
    10 rows in set (0.00 sec)
    

    在表格中插入新记录。

    mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
    Query OK, 1 row affected (0.03 sec)
    
    mysql> select * from emp;
    +------+---------+--------------+--------+------+
    | id   | name    | deg          | salary | dep  |
    +------+---------+--------------+--------+------+
    | 1201 | gopal   | manager      |  50000 | tp   |
    | 1202 | manisha | Proof reader |  50000 | TP   |
    | 1203 | php dev | TECH WRITER  |  50000 | AC   |
    | 1204 | Nilesh  | Domino dev   |  70000 | AF   |
    | 1205 | Vinayak | Java dev     |  50000 | IT   |
    | 1206 | Amish   | Cog dev      |  60000 | IT   |
    | 1207 | Jatin   | Oracel dev   |  40001 | IT   |
    | 1208 | Viren   | Java dev     |  70004 | IT   |
    | 1209 | Ashish  | Oracel dev   |  40001 | IT   |
    | 1210 | Satish  | Java dev     |  70004 | IT   |
    | 1211 | Jag     | be           |  20000 | IT   |
    +------+---------+--------------+--------+------+
    11 rows in set (0.00 sec)
    

    增量导入命令

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    --table emp \
    --split-by id \
    --check-column id \
    --incremental append \
    --last-value 1210 \
    --warehouse-dir /user/sqoop/ \
    --as-textfile
    

    导入后

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
    -rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
    -rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
    -rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
    -rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
    -rw-r--r--   1 cloudera cloudera           21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
    [cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
    1211,Jag,be,20000,IT
    

    模式:lastmodified

    mysql> describe orders;
    
    +-------------------+-------------+------+-----+---------+----------------+
    
    | Field             | Type        | Null | Key | Default | Extra          |
    
    +-------------------+-------------+------+-----+---------+----------------+
    
    | order_id          | int(11)     | NO   | PRI | NULL    | auto_increment |
    
    | order_date        | datetime    | NO   |     | NULL    |                |
    
    | order_customer_id | int(11)     | NO   |     | NULL    |                |
    
    | order_status      | varchar(45) | NO   |     | NULL    |                |
    
    +-------------------+-------------+------+-----+---------+----------------+
    
    4 rows in set (0.00 sec)    
    

    将订单导入hdfs

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    --table orders \
    --split-by order_id  \
    --target-dir /user/sqoop/orders \
    --as-textfile
    

    导入后

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
    -rw-r--r--   1 cloudera supergroup          0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
    -rw-r--r--   1 cloudera supergroup     741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
    -rw-r--r--   1 cloudera supergroup     753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
    -rw-r--r--   1 cloudera supergroup     752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
    -rw-r--r--   1 cloudera supergroup     752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
    

    更新订单数据

    mysql> select * from orders where order_id=10;
    +----------+---------------------+-------------------+-----------------+
    | order_id | order_date          | order_customer_id | order_status    |
    +----------+---------------------+-------------------+-----------------+
    |       10 | 2013-07-25 00:00:00 |              5648 | PENDING_PAYMENT |
    +----------+---------------------+-------------------+-----------------+
    1 row in set (0.00 sec)
    
    mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
    Query OK, 1 row affected (0.01 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    mysql> select * from orders where order_id=10;
    +----------+---------------------+-------------------+--------------+
    | order_id | order_date          | order_customer_id | order_status |
    +----------+---------------------+-------------------+--------------+
    |       10 | 2017-12-02 16:19:23 |              5648 | CLOSED       |
    +----------+---------------------+-------------------+--------------+
    1 row in set (0.00 sec)
    

    导入其他数据

    sqoop import \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username retail_dba \
    --password cloudera \
    --table orders \
    --split-by order_id  \
    --check-column order_date \
    --merge-key order_id \
    --incremental lastmodified \
    --target-dir /user/sqoop/orders1 \
    --as-textfile
    

    <强>输出

    [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
    -rw-r--r--   1 cloudera cloudera          0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
    -rw-r--r--   1 cloudera cloudera    2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
    

    注意:如果我们使用与之前相同的目录(订单),则会删除旧文件并创建新部件。