我在使用Sqoop(Oozie内部的sqoop命令)将数据加载到MySQL表时遇到了一些严重的问题(尝试了196次失败)。如果HDFS中只有一列数据(此处为foo),则没有问题,但是当有超过1列时,例如2列,数据没有加载到MySQL。
如果我自己运行Sqoop,那么数据会被加载到MySQL但是当我放入Oozie时,数据不会被加密。
workflow.xml包含两部分,第一部分将数据从Hive表加载到HDFS,第二部分将数据从HDFS加载到MySQL。
我正在使用ClouderaVM。
错误信息是:
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
-
hive> CREATE EXTERNAL TABLE IF NOT EXISTS foo (
id int,
city string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/cloudera/foo';
-
$ vi foo
1 a
4 b
hive> load data local inpath '/home/cloudera/foo' into table foo;
-
mysql> CREATE TABLE `foo` (`id` int(11) DEFAULT NULL, `city` varchar(22) DEFAULT NULL );
-
workflow.xml:
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="etl-wf">
<start to="hive-node"/>
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<script>script.q</script>
</hive>
<ok to="sqoop-node"/>
<error to="fail"/>
</action>
<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>export --connect jdbc:mysql://www.abc.net/test --username rio --password r005 --table foo --export-dir /user/cloudera/test --input-fields-terminated-by '\t' --input-lines-terminated-by '\n'</command>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
注意:这里的第一部分工作,即数据被绑定到配置单元中的测试表,但是没有从hdfs:/ user / cloudera / test加载到MySQL表foo。
-
vi script.q:
CREATE EXTERNAL TABLE IF NOT EXISTS test (
id int,
city string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION
'/user/cloudera/test';
INSERT OVERWRITE table test SELECT * FROM foo;
-
stderr logs
Note: /tmp/sqoop-mapred/compile/d4f769ef09667984820f21a38ae27bb4/foo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NumberFormatException: For input string: "1 a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
-
答案 0 :(得分:1)
这可能是记录器中的变压器,但仍值得研究:在&#34; 1 a&#34;中有4个空格。你在hdfs中检查文件中的内容了吗?列是否由\ t?
分隔答案 1 :(得分:1)
引起:java.lang.NumberFormatException:对于输入字符串:“1 a”
是这里的问题。更改您的mysql脚本并使用mysql> CREATE TABLE
foo (
id varchar(11) DEFAULT NULL,
city varchar(22) DEFAULT NULL );
问题是数据的第一列被推断为整数。这不是您的问题的解决方案。我刚刚推荐这种方法作为测试来检查数字格式是否是你最终的问题。请执行此操作并告知我们。我们会尽力帮助您。
P.S:1和a之间的字符很可能不是制表符分隔符。可能是空间。
答案 2 :(得分:0)
使用Oozie时无需转义参数。必须删除所有转义序列以及周围的单引号和双引号。 oozie工作流程中的sqoop命令应如下所示:
export --connect jdbc:mysql://www.abc.net/test --username rio --password r005 --table foo --export-dir / user / cloudera / test --input-fields-terminated -by \ t --input-lines-terminated-by \ n
\ t和\ n周围不应该有任何单引号。
答案 3 :(得分:0)
可能的原因可能是数据未在hive表中很好地插入 你能打印下面命令的结果吗?
hive -e "select id from foo;"
如果上述查询打印为
1 a
4 b
然后编辑输入文件以使用标签或更改分隔符。