sqoop-import-mainframe大数据集

时间:2017-03-29 09:52:22

标签: hadoop ftp sqoop mainframe

我正在尝试使用sqoop-import-mainframe将大型机数据集导入hdfs。 sqoop命令从oozie执行。

<action name="sqoop-load">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="${nameNode}/test-sqoop"/>
        </prepare>
        <command>import-mainframe --connect zhostname --username USERNAME --password PASSWORD --dataset AAAA.BBBB.CCCC.DDDD --target-dir ${nameNode}/test-sqoop -m 1 --verbose</command>
    </sqoop>
    <ok to="End"/>
    <error to="Kill"/>
</action>

仅当输入文件小于500 MB时,命令才会成功结束(txt文件在hdfs上创建)。如果我尝试导入大于500 MB的文件,sqoop操作不会启动,我收到此错误:

  

220-FTPD1 IBM FTP CS V2R1 at zhostname,2017-03-28 14:26:24。
      220如果闲置超过30分钟,连接将关闭       4606 [main] INFO org.apache.sqoop.util.MainframeFTPClientUtils - 连接到21上的zhostname       用户*******
      331请发送密码。
      通过*******
      230 USERNAME已登录。工作目录是&#34; USERNAME。&#34;。
      A型       200表示类型为Ascii NonPrint
      CWD&#39; AAAA.BBBB.CCCC.DDDD&#39;
      250&#34; AAAA.BBBB.CCCC.DDDD。&#34;是工作目录名称前缀       PASV
      227进入被动模式(192,168,20,1,123,54)
      LIST
      125列表开始OK       250名单成功完成。
      NOOP
      200 OK
      QUIT
      221收到退出命令。再见。
      4688 [main]错误org.apache.sqoop.tool.ImportTool - 遇到IOException正在运行导入作业:java.io.IOException:没有从AAAA.BBBB.CCCC.DDDD中检索到的顺序数据集
      在org.apache.sqoop.mapreduce.mainframe.MainframeDatasetInputFormat.getSplits(MainframeDatasetInputFormat.java:65)
      在org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
      在org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
      在org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
      在org.apache.hadoop.mapreduce.Job $ 10.run(Job.java:1307)
      在org.apache.hadoop.mapreduce.Job $ 10.run(Job.java:1304)
      at java.security.AccessController.doPrivileged(Native Method)
      在javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
      在org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
      在org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
      在org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
      在org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
      在org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
      在org.apache.sqoop.manager.MainframeManager.importTable(MainframeManager.java:97)
      在org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
      在org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
      在org.apache.sqoop.Sqoop.run(Sqoop.java:143)
      在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      在org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
      在org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
      在org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
      在org.apache.sqoop.Sqoop.main(Sqoop.java:236)
      在org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
      在org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
      在org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
      在org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      在java.lang.reflect.Method.invoke(Method.java:606)
      在org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
      在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
      在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
      在org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
      在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:164)
      at java.security.AccessController.doPrivileged(Native Method)
      在javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
      在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

我确定数据集存在。

环境是Cloudera CDH 5.8.3,我错过了一些配置吗? 谢谢!

0 个答案:

没有答案