包含Sqoop导入失败的Oozie脚本操作

时间:2017-02-02 05:43:16

标签: hadoop oozie hue sqoop

我正在尝试通过shell脚本使用sqoop从mysql导入数据的oozie工作流操作。

工作流程步骤: 1.删​​除任何现有目录。

  1. Java操作读取元数据配置单元表并创建table_metadata目录和* .cf文件。

  2. Shell脚本遍历table_metadata目录并扫描配置文件(* .cf)。每个文件都包含要导入的表名。然后它将表名抓取到sq_name变量中,该变量在sqoop导入查询中使用。

  3. 当我从命令行运行(sh script.sh)时,包含Sqoop的相同脚本工作正常。

    但是,当我尝试通过Oozie(Cloudera Hue GUI)脚本操作作为工作流运行时,它会失败,并显示以下错误。

    Oozie职业失败的任何想法?

    Shell脚本:

    hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata'   table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp        
            if $(hadoop fs -test -e $hdfs_path)
            then
             for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
              do
               echo ${file}
               TABLENAME=$(hadoop fs -cat ${file});
               echo $TABLENAME
               HDFSPATH=$table_temp_path
               sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path
             done
            fi
    

    WorkFlow.xml

    <workflow-app name="RDB2Hive" xmlns="uri:oozie:workflow:0.5">
        <start to="fs-1051"/>
        <kill name="Kill">
            <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <action name="fs-1051">
            <fs>
                  <delete path='${nameNode}/user/cloudera/workflow/table_metadata'/>
                  <mkdir path='${nameNode}/user/cloudera/workflow/table_metadata'/>
            </fs>
            <ok to="java-9025"/>
            <error to="Kill"/>
        </action>
        <action name="java-9025">
            <java>
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <main-class>org.rd2h.app.LoadMetaData</main-class>
                <arg>load_metadata</arg>
                <arg>/user/cloudera/workflow/table_metadata</arg>
            </java>
            <ok to="shell-d3bf"/>
            <error to="Kill"/>
        </action>
        <action name="shell-d3bf">
            <shell xmlns="uri:oozie:shell-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <exec>import_script.sh</exec>
                <file>/user/cloudera/workflow/scripts/import_script.sh#import_script.sh</file>
                  <capture-output/>
            </shell>
            <ok to="End"/>
            <error to="Kill"/>
        </action>
        <end name="End"/>
    </workflow-app>
    

    MR错误日志:

    Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/cloudera/.staging/job_1486009475788_0032/job.splitmetainfo
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1580)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1444)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1402)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1333)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1101)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1540)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1536)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1469)
        ***Caused by: java.io.FileNotFoundException: File does not exist: hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/cloudera/.staging/job_1486009475788_0032/job.splitmetainfo***
        at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1219)
        at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
        at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1575)
    

    Oozie错误日志:

    Stdoutput 2017-02-01 20:57:31,101 INFO  [main] sqoop.Sqoop (Sqoop.java:<init>(92)) - Running Sqoop version: 1.4.6-cdh5.8.0
    Stdoutput 2017-02-01 20:57:31,113 WARN  [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1042)) - Setting your password on the command-line is insecure. Consider using -P instead.
    Stdoutput 2017-02-01 20:57:31,304 INFO  [main] manager.MySQLManager (MySQLManager.java:initOptionDefaults(71)) - Preparing to use a MySQL streaming resultset.
    Stdoutput 2017-02-01 20:57:31,309 INFO  [main] tool.CodeGenTool (CodeGenTool.java:generateORM(92)) - Beginning code generation
    Stdoutput 2017-02-01 20:57:31,560 INFO  [main] manager.SqlManager (SqlManager.java:execute(776)) - Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
    Stdoutput 2017-02-01 20:57:31,579 INFO  [main] manager.SqlManager (SqlManager.java:execute(776)) - Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
    Stdoutput 2017-02-01 20:57:31,582 INFO  [main] orm.CompilationManager (CompilationManager.java:findHadoopJars(94)) - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
    Stdoutput 2017-02-01 20:57:32,587 INFO  [main] orm.CompilationManager (CompilationManager.java:jar(330)) - Writing jar file: /tmp/sqoop-yarn/compile/94cbe03d9d51f6ccc47ddd3ca98032be/departments.jar
    Stdoutput 2017-02-01 20:57:33,182 INFO  [main] tool.ImportTool (ImportTool.java:deleteTargetDir(544)) - Destination directory hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp is not present, hence not deleting.
    Stdoutput 2017-02-01 20:57:33,187 INFO  [main] manager.DirectMySQLManager (DirectMySQLManager.java:importTable(83)) - Beginning mysqldump fast path import
    Stdoutput 2017-02-01 20:57:33,187 INFO  [main] mapreduce.ImportJobBase (ImportJobBase.java:runImport(242)) - Beginning import of departments
    Stdoutput 2017-02-01 20:57:33,188 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    Stdoutput 2017-02-01 20:57:33,203 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.jar is deprecated. Instead, use mapreduce.job.jar
    Stdoutput 2017-02-01 20:57:33,210 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    Stdoutput 2017-02-01 20:57:33,253 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at localhost/127.0.0.1:8032
    Stdoutput 2017-02-01 20:57:35,040 INFO  [main] db.DBInputFormat (DBInputFormat.java:setTxIsolation(192)) - Using read commited transaction isolation
    Stdoutput 2017-02-01 20:57:35,072 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(202)) - number of splits:1
    Stdoutput 2017-02-01 20:57:35,190 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(291)) - Submitting tokens for job: job_1486009475788_0032
    Stdoutput 2017-02-01 20:57:35,190 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(293)) - Kind: mapreduce.job, Service: job_1486009475788_0029, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@76f3da25)
    Stdoutput 2017-02-01 20:57:35,198 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(293)) - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1486011413559, maxDate=1486616213559, sequenceNumber=67, masterKeyId=2)
    Stdoutput 2017-02-01 20:57:35,439 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(260)) - Submitted application application_1486009475788_0032
    Stdoutput 2017-02-01 20:57:35,463 INFO  [main] mapreduce.Job (Job.java:submit(1311)) - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1486009475788_0032/
    Stdoutput 2017-02-01 20:57:35,463 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356)) - Running job: job_1486009475788_0032
    Stdoutput 2017-02-01 20:57:41,569 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1377)) - Job job_1486009475788_0032 running in uber mode : false
    Stdoutput 2017-02-01 20:57:41,569 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1384)) -  map 0% reduce 0%
    Stdoutput 2017-02-01 20:57:41,682 INFO  [main] mapred.ClientServiceDelegate (ClientServiceDelegate.java:getProxy(277)) - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
    Stdoutput 2017-02-01 20:57:41,717 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1397)) - Job job_1486009475788_0032 failed with state FAILED due to: 
    Stdoutput 2017-02-01 20:57:41,725 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(393)) - The MapReduce job has already been retired. Performance
    Stdoutput 2017-02-01 20:57:41,725 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(394)) - counters are unavailable. To get this information, 
    Stdoutput 2017-02-01 20:57:41,726 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(395)) - you will need to enable the completed job store on 
    Stdoutput 2017-02-01 20:57:41,726 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(396)) - the jobtracker with:
    Stdoutput 2017-02-01 20:57:41,726 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(397)) - mapreduce.jobtracker.persist.jobstatus.active = true
    Stdoutput 2017-02-01 20:57:41,726 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(398)) - mapreduce.jobtracker.persist.jobstatus.hours = 1
    Stdoutput 2017-02-01 20:57:41,726 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(399)) - A jobtracker restart is required for these settings
    Stdoutput 2017-02-01 20:57:41,726 INFO  [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(400)) - to take effect.
    Stdoutput 2017-02-01 20:57:41,726 ERROR [main] tool.ImportTool (ImportTool.java:run(631)) - Error during import: Import job failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<<
    
    
    <<< Invocation of Main class completed <<<
    
    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
    
    Oozie Launcher failed, finishing Hadoop job gracefully
    
    Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000013-170201202514643-oozie-oozi-W/shell-d3bf--shell/action-data.seq
    
    Oozie Launcher ends
    

1 个答案:

答案 0 :(得分:2)

您可能希望为shell操作设置环境变量:

HADOOP_USER_NAME=${wf:user()}

此外,您似乎要导入多个表,因此您可能希望在每个表的目标目录下创建一个子目录。