Oozie Sqoop job - cannot restore job

时间:2016-02-03 03:47:30

标签: oozie sqoop hortonworks-data-platform

On HDP 2.3.4, using Oozie 4.2.0 and Sqoop 1.4.2, I'm trying to create a coordinator app that will execute sqoop jobs on a daily basis. I need the sqoop action to execute jobs because these are incremental imports.

I've configured sqoop-site.xml and started the sqoop-metastore and I'm able to create, list, and delete jobs via the command line but the workflow encounters the error: Cannot restore job: streamsummary_incremental

stderr

Sqoop command arguments :
             job
             --exec
             streamsummary_incremental
Fetching child yarn jobs
tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2
2016-02-03 09:46:47,193 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myHost>/<myIP>:8032
Child yarn jobs are found - 
=================================================================

>>> Invoking Sqoop command line now >>>

2241 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-02-03 09:46:47,404 WARN  [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2263 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6.2.3.4.0-3485
2016-02-03 09:46:47,426 INFO  [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage  - Cannot restore job: streamsummary_incremental
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage  - (No such job)
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job)
2553 [main] ERROR org.apache.sqoop.tool.JobTool  - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental
    at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
    at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
    at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
    at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
    at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
    at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

sqoop-site.xml

  <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>false</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:hsqldb:hsql://<myhost>:12345</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>SA</value>
    <description>The username to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value></value>
    <description>The password to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.port</name>
    <value>12345</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

workflow.xml

  <action name="sqoop-import-job">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${outputDir}"/>
      </prepare>
      <arg>job</arg>
      <arg>--exec</arg>
      <arg>${jobId}</arg>
    </sqoop>
    <ok to="hive-load"/>
    <error to="kill-sqoop"/>
  </action>

Additional info:

  • We're only running a single-node cluster.
  • Only Sqoop Client is installed.

I'm thinking maybe Oozie isn't able to connect to the metastore because we don't have sqoop server? Could anyone confirm this? If not that, could I have missed anything else?

Thanks!

1 个答案:

答案 0 :(得分:0)

我设法在评论中借助@SamsonScharfrichter解决了这个问题。我明确地在Oozie工作流程中传递了Metastore URL并且它有效:

<arg>job</arg>
<arg>--meta-connect</arg>
<arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg>
<arg>--exec</arg>
<arg>myjob</arg>

似乎Oozie试图连接到本地的Metastore,因为它没有sqoop-site.xml的副本,因此它不知道Metastore url(即使我是&#39; m运行单节点配置。)