安排和自动化sqoop导入/导出任务

时间:2015-06-08 23:22:22

标签: shell hadoop automation hive sqoop

我有一个sqoop作业,需要将数据从oracle导入到hdfs。

我正在使用的sqoop查询是
sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '1' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test1 --fields-terminated-by '\t'

我一次又一次地重新运行相同的查询,并将partitionid从1更改为96.所以我应该手动执行sqoop import命令96次。表'ORDERS'包含数百万行,每行有一个从1到96的分区。我需要从每个partitionid导入10001行到hdfs。

有没有办法做到这一点?如何自动化sqoop作业?

2 个答案:

答案 0 :(得分:0)

使用crontab进行调度。可以找到Crontab文档here,也可以在终端中使用man crontab

在shell脚本中添加sqoop import命令,并使用crontab执行此shell脚本。

答案 1 :(得分:0)

运行脚本:$ ./script.sh 20 // ------- for 20th entry

ramisetty@HadoopVMbox:~/ramu$ cat script.sh
#!/bin/bash

PART_ID=$1
TARGET_DIR_ID=$PART_ID
echo "PART_ID:" $PART_ID  "TARGET_DIR_ID: "$TARGET_DIR_ID
sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '$PART_ID' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test/$TARGET_DIR_ID --fields-terminated-by '\t'

适用于所有1到96次单拍

ramisetty@HadoopVMbox:~/ramu$ cat script_for_all.sh
#!/bin/bash

for part_id in {1..96};
do
 PART_ID=$part_id
 TARGET_DIR_ID=$PART_ID
 echo "PART_ID:" $PART_ID  "TARGET_DIR_ID: "$TARGET_DIR_ID
 sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '$PART_ID' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test/$TARGET_DIR_ID --fields-terminated-by '\t'
done