从oozie进口Sqoop蜂巢

时间:2016-12-21 05:18:32

标签: hive oozie sqoop

我正在使用Cloudera Quick Start Docker镜像

快速入门图片中安装了mysql。当我从命令行使用以下sqoop命令导入类别表时,它可以工作,我可以看到创建了类别表



sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite




然后我作为cloudera用户登录Hue,我确实用单个sqoop任务创建了一个新的oozie工作流程,但是当我尝试执行sqoop能够将数据下载到HDFS中,但是当它尝试创建hive表时最重要的是它失败了

这就是我的workflow.xml的样子



<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5" xmlns:sla="uri:oozie:sla:0.2">
    <start to="sqoop-4467"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="sqoop-4467">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
</command>
        </sqoop>
        <ok to="End"/>
        <error to="Kill"/>
          <sla:info>
            <sla:nominal-time>${nominal_time}</sla:nominal-time>
            <sla:should-end>${30 * MINUTES}</sla:should-end>
          </sla:info>
    </action>
    <end name="End"/>
</workflow-app>
&#13;
&#13;
&#13;

这就是我的job.properties文件的样子

&#13;
&#13;
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
nominal_time=2016-12-20T20:53Z
jobTracker=quickstart.cloudera:8032
&#13;
&#13;
&#13;

作业失败后,当我检查/ user / home / cloudera文件夹时,我可以看到包含数据的类别文件夹,但我没有看到正在创建的hive表。这是我在jobhistory服务器中看到的失败作业的错误

&#13;
&#13;
Sqoop command arguments :
             import
             --connect
             jdbc:mysql://localhost/retail_db
             --username
             root
             --password
             cloudera
             -m
             1
             --table
             categories
             --hive-import
             --hive-overwrite
Fetching child yarn jobs
tag id : oozie-3ff81b7743470e73dcb44de6729a66d9
Child yarn jobs are found - 
=================================================================

>>> Invoking Sqoop command line now >>>

6223 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6302 [uber-SubtaskRunner] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6-cdh5.7.0
6336 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
6336 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.BaseSqoopTool  - Using Hive-specific delimiters for output. You can override
6336 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.BaseSqoopTool  - delimiters with --fields-terminated-by, etc.
6367 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6654 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.MySQLManager  - Preparing to use a MySQL streaming resultset.
6666 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
7250 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7279 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7281 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
9303 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/4fd8773510dfe4082d136b2ab7d27eb3/categories.jar
9314 [uber-SubtaskRunner] WARN  org.apache.sqoop.manager.MySQLManager  - It looks like you are importing from mysql.
9314 [uber-SubtaskRunner] WARN  org.apache.sqoop.manager.MySQLManager  - This transfer can be faster! Use the --direct
9314 [uber-SubtaskRunner] WARN  org.apache.sqoop.manager.MySQLManager  - option to exercise a MySQL-specific fast path.
9314 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.MySQLManager  - Setting zero DATETIME behavior to convertToNull (mysql)
9318 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of categories
9388 [uber-SubtaskRunner] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
10238 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
29055 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 1.0049 KB in 19.659 seconds (52.3425 bytes/sec)
29061 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 58 records.
29076 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
29097 [uber-SubtaskRunner] INFO  org.apache.sqoop.hive.HiveImport  - Loading uploaded data into Hive
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000012-161221020706124-oozie-oozi-W/sqoop-4467--sqoop/action-data.seq

Oozie Launcher ends

 
&#13;
&#13;
&#13;

1 个答案:

答案 0 :(得分:1)

你是否将hive-site.xml复制到HDFS?你可以使用--target-dir将表导入hdfs路径并将hive表的位置设置为指向该路径

相关问题