使用sqoop将浮点数和双精度数从hdsf导出到MySql中时出现问题

时间:2014-02-28 16:59:34

标签: mysql hadoop sqoop

我正在使用hadoop版本1.2.1和sqoop 1.4.4

我是hadoop / sqoop的新手,遇到了问题。我有hdfs中的数据,我想导出到MySQL,但导出仍然失败。 我使用的陈述是:

  

sqoop export --connect jdbc:mysql:// {ip address} / {database} --username username -P --table {tablename} --export-dir {export-dir} --input-fields-终止于',' - 行 - 终止 - 由'\ n'--verbose

我得到的错误是:

14/02/28 10:12:40 INFO mapred.JobClient: Running job: job_201402040959_0234
14/02/28 10:12:41 INFO mapred.JobClient:  map 0% reduce 0%
14/02/28 10:12:51 INFO mapred.JobClient:  map 50% reduce 0%
14/02/28 10:22:51 INFO mapred.JobClient:  map 0% reduce 0%
14/02/28 10:22:52 INFO mapred.JobClient: Task Id : attempt_201402040959_0234_m_000000_0, Status : FAILED
Task attempt_201402040959_0234_m_000000_0 failed to report status for 600 seconds. Killing!
14/02/28 10:22:52 INFO mapred.JobClient: Task Id : attempt_201402040959_0234_m_000001_0, Status : FAILED
Task attempt_201402040959_0234_m_000001_0 failed to report status for 600 seconds. Killing!
14/02/28 10:23:00 INFO mapred.JobClient:  map 50% reduce 0%
14/02/28 10:33:00 INFO mapred.JobClient:  map 0% reduce 0%
14/02/28 10:33:00 INFO mapred.JobClient: Task Id : attempt_201402040959_0234_m_000000_1, Status : FAILED
Task attempt_201402040959_0234_m_000000_1 failed to report status for 600 seconds. Killing!
14/02/28 10:33:00 INFO mapred.JobClient: Task Id : attempt_201402040959_0234_m_000001_1, Status : FAILED
Task attempt_201402040959_0234_m_000001_1 failed to report status for 600 seconds. Killing!
14/02/28 10:33:09 INFO mapred.JobClient:  map 50% reduce 0%
14/02/28 10:43:09 INFO mapred.JobClient:  map 0% reduce 0%
14/02/28 10:43:09 INFO mapred.JobClient: Task Id : attempt_201402040959_0234_m_000000_2, Status : FAILED
Task attempt_201402040959_0234_m_000000_2 failed to report status for 600 seconds. Killing!
14/02/28 10:43:10 INFO mapred.JobClient: Task Id : attempt_201402040959_0234_m_000001_2, Status : FAILED
Task attempt_201402040959_0234_m_000001_2 failed to report status for 600 seconds. Killing!
14/02/28 10:43:18 INFO mapred.JobClient:  map 50% reduce 0%
14/02/28 10:53:18 INFO mapred.JobClient:  map 25% reduce 0%
14/02/28 10:53:19 INFO mapred.JobClient:  map 0% reduce 0%
14/02/28 10:53:20 INFO mapred.JobClient: Job complete: job_201402040959_0234
14/02/28 10:53:20 INFO mapred.JobClient: Counters: 7
14/02/28 10:53:20 INFO mapred.JobClient:   Job Counters 
14/02/28 10:53:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11987
14/02/28 10:53:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/28 10:53:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/28 10:53:20 INFO mapred.JobClient:     Launched map tasks=8
14/02/28 10:53:20 INFO mapred.JobClient:     Data-local map tasks=8
14/02/28 10:53:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/02/28 10:53:20 INFO mapred.JobClient:     Failed map tasks=1
14/02/28 10:53:20 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 2,441.242 seconds (0 bytes/sec)
14/02/28 10:53:20 INFO mapreduce.ExportJobBase: Exported 0 records.
14/02/28 10:53:20 ERROR tool.ExportTool: Error during export: Export job failed!

数据的一个例子是:

201110,1.8181818181818181
201111,1.4597701149425288
201112,1.766990291262136
20119,1.6153846153846154
20121,1.5857142857142856
201210,1.55
201211,1.5294117647058822
201212,1.6528925619834711
20122,1.5789473684210527
20123,1.4848484848484849
20124,1.654320987654321
20125,1.5942028985507246
20126,1.5333333333333334
20127,1.4736842105263157
20128,1.4666666666666666
20129,1.4794520547945205
20131,1.6875
201310,8.233183856502242
201311,8.524886877828054
201312,9.333333333333334
20132,1.7272727272727273
20133,3.42
20134,6.380597014925373
20135,9.504716981132075
20136,8.538812785388128
20137,8.609649122807017
20138,8.777272727272727
20139,8.506787330316742
20141,4.741784037558685

我尝试使用相同的导出语句导出类似的数据集,只使用整数而不是双精度,并且成功。我也尝试了一个类似的数据集与浮动而不是双打,但也失败了。有人请给我一个暗示,为什么这不起作用?我是否对不适合MySQL的数据类型做错了什么?

我还尝试使用以下添加运行相同的查询:

  

-m 1

除了地图步骤完成100%而不是仅仅50%之外,它会产生与上述相同的错误。

- 谢谢,如果我应该提供一些额外的信息,请告诉我。

2 个答案:

答案 0 :(得分:0)

请更新说明Hadoop,Sqoop和MySQL版本的问题,以便可以复制该问题。

我将假设您正在使用Hadoop 0.21.0。如果是这种情况,那么它可能是由使用TaskInputOutputContext的org.apache.sqoop.mapreduce.ProgressThread类引起的,该问题没有按照问题[MAPREDUCE-1905]中的描述正确报告给基础报告者。

如果您使用的是0.21.0,则需要使用0.21.1或其他Hadoop版本。

否则我会认为它是ProgressThread中的一些问题或者Sqoop的报告方式。如果这不起作用,则YARN或MR1日志中可能还有其他内容。

YARN记录默认文件夹(在etc / hadoop / yarn-env.sh中设置):

cd $HADOOP_YARN_HOME/logs

MR1记录默认文件夹(在etc / hadoop / mapred-env.sh中设置):

cd $HADOOP_MAPRED_HOME/logs

答案 1 :(得分:0)

错误是由于列名称中的下划线。显然你不能在列名中添加下划线。