OOZIE成功运行MR作业,但从未收到状态更新

时间:2018-08-22 16:31:47

标签: apache hadoop yarn oozie

我有一个运行名称节点,数据节点,作业历史记录,yarnmaster,oozie和mysql的容器的docker网络。我的oozie可以将作业成功提交到我的hadoop集群。作业将成功,但是Jobhistory拒绝连接到oozie回调。稍后,oozie Web界面和实例停止工作,并且诸如“ oozie job -info”之类的任何命令都将拒绝连接,如下所示:

bash-4.2$ oozie job -info 0000000-180822162217556-oozie-W
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.authentication.client.KerberosAuthenticator).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Job ID : 0000000-180822162217556-oozie-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : WorkflowRunnerTest
App Path      : hdfs://namenode:8020/user/hadoop/oozie-jobs/WordCountTest
Status        : RUNNING
Run           : 0
User          : hadoop
Group         : -
Created       : 2018-08-22 16:22 GMT
Started       : 2018-08-22 16:22 GMT
Last Modified : 2018-08-22 16:23 GMT
Ended         : -
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-180822162217556-oozie-W@:start:                                       OK        -                      OK         -
------------------------------------------------------------------------------------------------------------------------------------
0000000-180822162217556-oozie-W@intersection0                                 RUNNING   job_1534954806897_0001 RUNNING    -
------------------------------------------------------------------------------------------------------------------------------------

bash-4.2$ oozie job -info 0000000-180822162217556-oozie-W
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry count = 2
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 4 sec. Retry count = 3
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 8 sec. Retry count = 4

此作业的作业历史日志如下:

Showing 4096 bytes of 69256 total. Click here for the full log.

eds:0 ContAlloc:4 ContRel:0 HostLocal:3 RackLocal:0
2018-08-22 16:25:36,630 INFO [Thread-73] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://namenode:8020 /tmp/hadoop-yarn/staging/hadoop/.staging/job_1534954806897_0002
2018-08-22 16:25:36,636 INFO [Thread-73] org.apache.hadoop.ipc.Server: Stopping server on 35021
2018-08-22 16:25:36,638 INFO [IPC Server listener on 35021] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 35021
2018-08-22 16:25:36,639 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
2018-08-22 16:25:36,639 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
2018-08-22 16:25:36,641 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2018-08-22 16:25:36,653 INFO [Thread-73] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job end notification started for jobID : job_1534954806897_0002
2018-08-22 16:25:36,654 INFO [Thread-73] org.mortbay.log: Job end notification attempts left 0
2018-08-22 16:25:36,654 INFO [Thread-73] org.mortbay.log: Job end notification trying http://oozie:11000/oozie/callback?id=0000000-180822162217556-oozie-W@intersection0&status=SUCCEEDED
2018-08-22 16:25:36,663 WARN [Thread-73] org.mortbay.log: Job end notification to http://oozie:11000/oozie/callback?id=0000000-180822162217556-oozie-W@intersection0&status=SUCCEEDED failed
java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1199)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    at org.apache.hadoop.mapreduce.v2.app.JobEndNotifier.notifyURLOnce(JobEndNotifier.java:130)
    at org.apache.hadoop.mapreduce.v2.app.JobEndNotifier.notify(JobEndNotifier.java:174)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.sendJobEndNotify(MRAppMaster.java:686)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:654)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:728)
2018-08-22 16:25:37,666 WARN [Thread-73] org.mortbay.log: Job end notification failed to notify : http://oozie:11000/oozie/callback?id=0000000-180822162217556-oozie-W@intersection0&status=SUCCEEDED
2018-08-22 16:25:42,667 INFO [Thread-73] org.apache.hadoop.ipc.Server: Stopping server on 41027
2018-08-22 16:25:42,668 INFO [IPC Server listener on 41027] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 41027
2018-08-22 16:25:42,670 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2018-08-22 16:25:42,678 INFO [Thread-73] org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:0

有什么特别的原因可能引起打this吗?

这是oozie.log的输出:

2018-08-22 20:25:21,367  INFO Services:520 - SERVER[oozie] Initialized
2018-08-22 20:25:21,369  INFO Services:520 - SERVER[oozie] Running with JARs for Hadoop version [2.6.5]
2018-08-22 20:25:21,369  INFO Services:520 - SERVER[oozie] Oozie System ID [oozie] started!
2018-08-22 20:25:31,345  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:25:31,345  INFO PauseTransitService:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:25:31,348  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running coordinator status service first instance
2018-08-22 20:25:31,609  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running bundle status service first instance
2018-08-22 20:25:31,637  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:25:31,641  INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService - Curr Date= 2018-08-22T20:30Z, Num jobs to materialize = 0
2018-08-22 20:25:31,648  INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.CoordMaterializeTriggerService]
2018-08-22 20:25:31,723  INFO PurgeXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [30] days, Coordinator Jobs older than [7] days, and Bundlejobs older than [7] days.
2018-08-22 20:25:31,723  INFO PurgeXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] ENDED Purge deleted [0] workflows, [0] coordinatorActions, [0] coordinators, [0] bundles
2018-08-22 20:25:31,746  INFO PauseTransitService:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:26:31,571  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:26:31,572  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running coordinator status service from last instance time =  2018-08-22T20:25Z
2018-08-22 20:26:31,614  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running bundle status service from last instance time =  2018-08-22T20:25Z
2018-08-22 20:26:31,641  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:26:31,676  INFO PauseTransitService:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:26:31,708  INFO PauseTransitService:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:27:31,571  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:27:31,572  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running coordinator status service from last instance time =  2018-08-22T20:26Z
2018-08-22 20:27:31,584  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Running bundle status service from last instance time =  2018-08-22T20:26Z
2018-08-22 20:27:31,589  INFO StatusTransitService$StatusTransitRunnable:520 - SERVER[oozie] Released lock for [org.apache.oozie.service.StatusTransitService]
2018-08-22 20:27:31,639  INFO PauseTransitService:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:27:31,661  INFO PauseTransitService:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService]
2018-08-22 20:27:47,241  INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] Start action [0000000-180822202517586-oozie-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-08-22 20:27:47,242  INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] [***0000000-180822202517586-oozie-W@:start:***]Action status=DONE
2018-08-22 20:27:47,242  INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] [***0000000-180822202517586-oozie-W@:start:***]Action updated in DB!
2018-08-22 20:27:47,394  INFO WorkflowNotificationXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000000-180822202517586-oozie-W
2018-08-22 20:27:47,394  INFO WorkflowNotificationXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000000-180822202517586-oozie-W@:start:
2018-08-22 20:27:47,432  INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] Start action [0000000-180822202517586-oozie-W@intersection0] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2018-08-22 20:27:47,507  INFO HadoopAccessorService:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] Processing configuration file [/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/action-conf/default.xml] for action[default] and hostPort [*]
2018-08-22 20:27:47,508  INFO HadoopAccessorService:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] Processing configuration file [/opt/app-root/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/action-conf/map-reduce.xml] for action [map-reduce] and hostPort [*]
2018-08-22 20:27:48,482  WARN JobResourceUploader:64 - SERVER[oozie] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-08-22 20:27:48,493  WARN JobResourceUploader:171 - SERVER[oozie] No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2018-08-22 20:27:50,173  INFO MapReduceActionExecutor:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] checking action, hadoop job ID [job_1534969405649_0001] status [RUNNING]
2018-08-22 20:27:50,175  INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] [***0000000-180822202517586-oozie-W@intersection0***]Action status=RUNNING
2018-08-22 20:27:50,176  INFO ActionStartXCommand:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] [***0000000-180822202517586-oozie-W@intersection0***]Action updated in DB!
2018-08-22 20:27:50,208  INFO WorkflowNotificationXCommand:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] No Notification URL is defined. Therefore nothing to notify for job 0000000-180822202517586-oozie-W@intersection0
2018-08-22 20:28:02,437  INFO CallbackServlet:520 - SERVER[oozie] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] callback for action [0000000-180822202517586-oozie-W@intersection0]
2018-08-22 20:28:06,269  INFO MapReduceActionExecutor:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] External ID swap, old ID [job_1534969405649_0001] new ID [job_1534969405649_0002]
2018-08-22 20:28:06,273  INFO MapReduceActionExecutor:520 - SERVER[oozie] USER[hadoop] GROUP[-] TOKEN[] APP[WorkflowRunnerTest] JOB[0000000-180822202517586-oozie-W] ACTION[0000000-180822202517586-oozie-W@intersection0] checking action, hadoop job ID [job_1534969405649_0002] status [RUNNING]

1 个答案:

答案 0 :(得分:0)

您尝试添加oozie info命令吗? -oozie $ OOZIE_URL

其中OOZIE_URL是为实际oozie url设置的变量