Sqoop-导入作业失败

时间:2018-07-19 05:25:36

标签: docker hadoop hive sqoop cloudera-quickstart-vm

我正试图通过Sqoop从SQL Server到Hive导入一个3200万条记录的表。连接成功是SQL Server。但是Map / Reduce作业无法成功执行。它给出以下错误:

@Effect()    
initDomain$: Observable<Action> = this.actions$.pipe(
  ofType('INIT_DOMAIN'),
  mergeMap((action: any) =>
     this.http.get('https://demo.api/url1.php').pipe(
        switchMap((data) => [
           {type: 'INIT_IT', payload: data}
        ]),
        catchError(() => of({type: 'INIT_IT_FAILED'}))
     )
  )
);

这是yarn-site.xml文件中的配置

18/07/19 04:00:11 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
18/07/19 04:00:27 DEBUG db.DBConfiguration: Fetching password from job credentials store
18/07/19 04:00:27 INFO db.DBInputFormat: Using read commited transaction isolation
18/07/19 04:00:27 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1'
18/07/19 04:00:28 INFO mapreduce.JobSubmitter: number of splits:1
18/07/19 04:00:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1531917395459_0002
18/07/19 04:00:30 INFO impl.YarnClientImpl: Submitted application application_1531917395459_0002
18/07/19 04:00:30 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1531917395459_0002/
18/07/19 04:00:30 INFO mapreduce.Job: Running job: job_1531917395459_0002
    18/07/19 04:43:02 INFO mapreduce.Job: Job job_1531917395459_0002 running in uber mode : false
18/07/19 04:43:03 INFO mapreduce.Job:  map 0% reduce 0%
18/07/19 04:43:04 INFO mapreduce.Job: Job job_1531917395459_0002 failed with state FAILED due to: Application application_1531917395459_0002 failed 2 times due to ApplicationMaster for attempt appattempt_1531917395459_0002_000002 timed out. Failing the application.
18/07/19 04:43:08 INFO mapreduce.Job: Counters: 0
18/07/19 04:43:08 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/07/19 04:43:09 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 2,576.6368 seconds (0 bytes/sec)
18/07/19 04:43:10 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/07/19 04:43:10 INFO mapreduce.ImportJobBase: Retrieved 0 records.
18/07/19 04:43:10 ERROR tool.ImportTool: Error during import: Import job failed!

首先,当通过0.0.0.0:8032与资源管理器连接时,该作业被卡住了。因此,我将主机更改为127.0.0.1。然后执行继续。但是随后发生了以上错误。甚至我都尝试过仅用1000行执行此作业,但存在相同的错误。另外,有时工作会被杀死。

这是我的sqoop命令

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.dispatcher.exit-on-error</name>
    <value>true</value>
  </property>

  <property>
    <description>List of directories to store localized files in.</description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
  </property>

  <property>
    <description>Where to store container logs.</description>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/var/log/hadoop-yarn/containers</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/var/log/hadoop-yarn/apps</value>
  </property>

  <property>
    <description>Classpath for typical applications.</description>
     <name>yarn.application.classpath</name>
     <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
     </value>
  </property>

<!-- added by me -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
 <property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>

 <property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>

 <property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>

</configuration>

这是我的docker命令,以防万一:

sqoop import --connect "jdbc:sqlserver://system-ip;databaseName=TEST" --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --username user1 --password password --hive-import --create-hive-table --hive-table "customer_data_1000" --table "customer_data_1000" --split-by Account_Branch_Converted -m 1 --verbose

0.1:7180:7180 -p 127.0.0.1:50070:50070 -i 7c41929668d8 / usr / bin / docker-quickstart

这是资源管理器日志:

  docker run --hostname=quickstart.cloudera --privileged=true -t -p 127.0.0.1:8888:8888 -p 127.0.

我要去哪里错了?

1 个答案:

答案 0 :(得分:0)

我无法为您提供精确的解决方案,但是我只能告诉您根本原因是什么

  1. 尝试以非root用户身份运行Sqoop作业。
  2. 检查主机上是否正确安装了JDK,并且JAVA_HOME设置正确。
  3. 检查您是否已为正在使用的数据库授予正确的权限。

由于上述原因之一,您的工作失败了。当您具有足够的v核和可用内存时,还将创建容器。因此,从处理端来看,一切都很好,但是必须存在配置错误。