我正在尝试在Hue的Oozie编辑器中使用Sqoop动作,但是我无法让它工作。
这是我到目前为止所尝试的内容。
我把所有内容放在参数中,而不是命令(http://alvincjin.blogspot.com.au/2014/06/create-sqoop-action-in-oozie-using-hue.html)
此外,我正在尝试连接到Teradata,因此我已将jdbc jar放入HDFS并将其添加到Files中。 这是当前工作流在编辑器中的样子: Sqoop Action
工作流程定义是:
<workflow-app name="Sqoop_test" xmlns="uri:oozie:workflow:0.5">
<start to="sqoop-b20d"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-b20d">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>development</value>
</property>
<property>
<name>mapred.job.name</name>
<value>test_sqoop</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>0</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://XXXXX</arg>
<arg>--query</arg>
<arg>select count(*) from XXXXX</arg>
<arg>--fetch-size</arg>
<arg>10000</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<arg>--hive-table-name</arg>
<arg>XXXXX.tmp_sqoop_test</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--target-dir</arg>
<arg>/user/dXXXXX/digital/test/tmp_sqoop_test</arg>
<arg>--username</arg>
<arg>XXXXX</arg>
<arg>--password</arg>
<arg>XXXXX</arg>
<file>/user/hue/oozie/workspaces/digital/lib/terajdbc4.jar#terajdbc4.jar</file>
<file>/user/hue/oozie/workspaces/digital/lib/teradata-connector-1.3.4-hadoop220.jar#teradata-connector-1.3.4-hadoop220.jar</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
然而,我收到此错误:
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(296)) - Error parsing arguments for import:
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-table-name
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-table-name
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: XXXXX.tmp_sqoop_test
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: tdcprdr_app_digital.tmp_sqoop_test
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-import
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-import
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-overwrite
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-overwrite
2787 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --target-dir
2016-01-06 14:13:52,115 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --target-dir
...
我的印象是,可以通过将所有内容放在参数中来解决此错误。 通过shell脚本运行时,相同的代码可以正常工作。我已经尝试在命令部分放置导入命令和连接字符串,但这甚至都没有运行。我也试过创建一个简约的sqoop动作,只需要查询和连接语句,如下所示:
<workflow-app name="Sqoop_minimal" xmlns="uri:oozie:workflow:0.5">
<start to="sqoop-eeeb"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-eeeb">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://tdXXXXX</arg>
<arg>--query</arg>
<arg>select count(*) from XXXXX</arg>
<arg>--target-dir</arg>
<arg>/user/dXXXXX/digital/test/tmp_sqoop_test</arg>
<arg>--username</arg>
<arg>XXXXX</arg>
<arg>--password</arg>
<arg>XXXXX</arg>
<file>/user/hue/oozie/workspaces/digital/lib/teradata-connector-1.3.4-hadoop220.jar#teradata-connector-1.3.4-hadoop220.jar</file>
<file>/user/hue/oozie/workspaces/digital/lib/terajdbc4.jar#terajdbc4.jar</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
通过这个工作流程,我得到一个非常模糊的错误如下:
>>> Invoking Sqoop command line now >>>
2287 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-01-06 14:57:48,381 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(175)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2324 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5.3.0.0.0-249
2016-01-06 14:57:48,418 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(92)) - Running Sqoop version: 1.4.5.3.0.0.0-249
2339 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2016-01-06 14:57:48,433 WARN [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1014)) - Setting your password on the command-line is insecure. Consider using -P instead.
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie版本是4.1.0.3.0.0.0-249。
我已尝试在线搜索解决方案,但没有运气。 任何帮助,将不胜感激。谢谢!
已经看过并尝试了链接:
https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Sqoop-fails-with-quot-Error-parsing-arguments-for-import-quot/td-p/31930
http://stackoverflow.com/questions/25770698/sqoop-free-form-query-causing-unrecognized-arguments-in-hue-oozie
答案 0 :(得分:1)
sqoop没有这样的argumnets --hive表名 使用 --hive表。它现在不应该显示无法识别的参数