MapReduce作业状态卡在运行状态

时间:2019-09-27 17:53:17

标签: hadoop oozie

我正在尝试从Oozie(4.1.0)运行Mapreduce程序。

但是它的状态为RUNNING state,并且停留在相同的状态。

worklfow.xml

<workflow-app xmlns="uri:oozie:workflow:0.4" name="simple-Workflow">
   <start to="RunMapreduceJob" />
   <action name="RunMapreduceJob">
      <map-reduce>
         <job-tracker>localhost:8088</job-tracker>
         <name-node>hdfs://localhost:9000</name-node>
         <prepare>
            <delete path="hdfs://localhost:9000/dataoutput"/>
         </prepare>
         <configuration>
            <property>
               <name>mapred.job.queue.name</name>
               <value>default</value>
            </property>
            <property>
               <name>mapred.mapper.class</name>
               <value>DataDividerByUser.DataDividerMapper</value>
            </property>
            <property>
               <name>mapred.reducer.class</name>
               <value>DataDividerByUser.DataDividerReducer</value>
            </property>
            <property>
               <name>mapred.output.key.class</name>
               <value>org.apache.hadoop.io.IntWritable</value>
            </property>
            <property>
               <name>mapred.output.value.class</name>
               <value>org.apache.hadoop.io.Text</value>
            </property>
            <property>
               <name>mapred.input.dir</name>
               <value>/data</value>
            </property>
            <property>
               <name>mapred.output.dir</name>
               <value>/dataoutput</value>
            </property>
         </configuration>
      </map-reduce>
      <ok to="end" />
      <error to="fail" />
   </action>
   <kill name="fail">
      <message>Mapreduce program Failed</message>
   </kill>
   <end name="end" />
</workflow-app>

job.properties

nameNode=hdfs://localhost:9000
jobTracker=localhost:8088
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/Config

作业跟踪器也正在运行,这是屏幕快照https://prnt.sc/pbvb5i

在OOzie url中获取JobInfo时出错

JA009: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "ec2-18-222-170-204.us-east-2.compute.amazonaws.com/18.222.170.204"; destination host is: "localhost":8088;

我能知道是什么错吗?

更新:

现在所有节点都可以正常工作https://prnt.sc/pc4a7n

Oozie日志

hdfs://localhost:9000/user/hduser/share/lib/lib_20190928171545/sqoop/oozie-sharelib-sqoop-4.1.0.jar, hdfs://localhost:9000/user/hduser/share/l
ib/lib_20190928171545/sqoop/sqoop-1.4.3-hadoop100.jar]
2019-09-28 17:34:29,232  INFO Services:541 - SERVER[localhost] Initialized
2019-09-28 17:34:29,234  INFO Services:541 - SERVER[localhost] Running with JARs for Hadoop version [2.3.0]
2019-09-28 17:34:29,234  INFO Services:541 - SERVER[localhost] Oozie System ID [oozie-hdus] started!
2019-09-28 17:34:29,526  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,526  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,536  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,536  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,560  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,560  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,562  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:29,562  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:34:39,222  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:34:39,224  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running coordinator status se
rvice first instance
2019-09-28 17:34:39,222  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:34:39,519  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running bundle status service
 first instance
2019-09-28 17:34:39,521  INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:541 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService - Curr Date= 2019-09-28T12:09Z, Num jobs to materialize = 0
2019-09-28 17:34:39,521  INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:541 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.CoordMaterializeTriggerService]
2019-09-28 17:34:39,570  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:34:39,571  INFO PurgeXCommand:541 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Wo
rkflow Jobs older than [30] days, Coordinator Jobs older than [7] days, and Bundlejobs older than [7] days.
2019-09-28 17:34:39,571  INFO PurgeXCommand:541 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] ENDED Purge deleted [0] w
orkflows, [0] coordinatorActions, [0] coordinators, [0] bundles
2019-09-28 17:34:39,639  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:35:39,571  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:35:39,572  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running coordinator status se
rvice from last instance time =  2019-09-28T12:04Z
2019-09-28 17:35:39,616  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running bundle status service
 from last instance time =  2019-09-28T12:04Z
2019-09-28 17:35:39,630  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:35:39,647  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:35:39,662  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:35:39,814  INFO ActionStartXCommand:541 - SERVER[localhost] USER[hduser] GROUP[-] TOKEN[] APP[simple-Workflow] JOB[0000000-190928
171702728-oozie-hdus-W] ACTION[0000000-190928171702728-oozie-hdus-W@RunMapreduceJob] Start action [0000000-190928171702728-oozie-hdus-W@RunMapr
educeJob] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2019-09-28 17:36:39,631  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:36:39,632  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running coordinator status se
rvice from last instance time =  2019-09-28T12:05Z
2019-09-28 17:36:39,639  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running bundle status service
 from last instance time =  2019-09-28T12:05Z
2019-09-28 17:36:39,643  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:36:39,663  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:36:39,685  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:37:39,644  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:37:39,645  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running coordinator status se
rvice from last instance time =  2019-09-28T12:06Z
2019-09-28 17:37:39,656  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running bundle status service
 from last instance time =  2019-09-28T12:06Z
2019-09-28 17:37:39,661  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:37:39,686  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:37:39,705  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:37:53,297  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,297  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,299  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,299  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,312  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,312  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,478  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,478  WARN AuthenticationFilter:341 - SERVER[localhost] AuthenticationToken ignored: AuthenticationToken expired
2019-09-28 17:37:53,631  WARN ParameterVerifier:544 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] The application does 
not define formal parameters in its XML definition
2019-09-28 17:37:53,893  INFO ActionStartXCommand:541 - SERVER[localhost] USER[hduser] GROUP[-] TOKEN[] APP[simple-Workflow] JOB[0000000-190928
173423962-oozie-hdus-W] ACTION[0000000-190928173423962-oozie-hdus-W@:start:] Start action [0000000-190928173423962-oozie-hdus-W@:start:] with u
ser-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2019-09-28 17:37:53,895  INFO ActionStartXCommand:541 - SERVER[localhost] USER[hduser] GROUP[-] TOKEN[] APP[simple-Workflow] JOB[0000000-190928
173423962-oozie-hdus-W] ACTION[0000000-190928173423962-oozie-hdus-W@:start:] [***0000000-190928173423962-oozie-hdus-W@:start:***]Action status=
DONE
2019-09-28 17:37:53,895  INFO ActionStartXCommand:541 - SERVER[localhost] USER[hduser] GROUP[-] TOKEN[] APP[simple-Workflow] JOB[0000000-190928
173423962-oozie-hdus-W] ACTION[0000000-190928173423962-oozie-hdus-W@:start:] [***0000000-190928173423962-oozie-hdus-W@:start:***]Action updated
 in DB!
2019-09-28 17:37:54,128  INFO ActionStartXCommand:541 - SERVER[localhost] USER[hduser] GROUP[-] TOKEN[] APP[simple-Workflow] JOB[0000000-190928
173423962-oozie-hdus-W] ACTION[0000000-190928173423962-oozie-hdus-W@RunMapreduceJob] Start action [0000000-190928173423962-oozie-hdus-W@RunMapr
educeJob] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2019-09-28 17:38:39,662  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:38:39,663  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running coordinator status se
rvice from last instance time =  2019-09-28T12:07Z
2019-09-28 17:38:39,671  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running bundle status service
 from last instance time =  2019-09-28T12:07Z
2019-09-28 17:38:39,677  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:38:39,706  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:38:39,722  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:39:39,527  INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:541 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService - Curr Date= 2019-09-28T12:14Z, Num jobs to materialize = 0
2019-09-28 17:39:39,528  INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:541 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.CoordMaterializeTriggerService]
2019-09-28 17:39:39,679  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:39:39,680  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running coordinator status se
rvice from last instance time =  2019-09-28T12:08Z
2019-09-28 17:39:39,687  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Running bundle status service
 from last instance time =  2019-09-28T12:08Z
2019-09-28 17:39:39,691  INFO StatusTransitService$StatusTransitRunnable:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache
.oozie.service.StatusTransitService]
2019-09-28 17:39:39,723  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTra
nsitService]
2019-09-28 17:39:39,743  INFO PauseTransitService:541 - SERVER[localhost] USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTra
nsitService]

但Mapreduce仍处于“运行中”准备状态。 最终这样显示错误

JA009: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "localhost/127.0.0.1"; destination host is: "localhost":8088; 

更新-1 hadoop fs -ls hdfs://localhost:9000输出

drwxr-xr-x   - hduser supergroup          0 2019-09-28 19:44 hdfs://localhost:9000/user/hduser/oozie-hdus
drwxr-xr-x   - hduser supergroup          0 2019-09-28 17:15 hdfs://localhost:9000/user/hduser/share

0 个答案:

没有答案