我正在从远程HDFS到我的HDFS进行Spark复制。
我有一个Oozie协调器,每天检查一次,如果远程HDFS的指定目录中的数据可用,然后运行工作流程
coordinator.xml:
<coordinator-app name="My App" frequency="${coord:days(1)}" start="${startTime}" end="${endTime}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
<datasets>
<dataset name="hdfsDirectory" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="UTC">
<uri-template>${hdfsDirectoryToPoll}/partition=${YEAR}-${MONTH}-${DAY}</uri-template>
<done-flag></done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="sourceFile" dataset="hdfsDirectory">
<start-instance>${coord:current(-1)}</start-instance>
<end-instance>${coord:current(0)}</end-instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>${workflowPath}</app-path>
<configuration>
<property>
<name>source</name>
<value>${coord:dataIn('sourceFile')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
workflow.xml:
<action name = "action1">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${remoteNode}</host>
<command>${sparkSubmitCommand}</command>
</ssh>
<ok to = "end" />
<error to = "kill" />
</action>
<kill name="kill">
<message>Action failed, error message - [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name = "end" />
job.properties:
remoteNode=remoteNode
nameNode=hdfs://test
jobTracker=test:8050
hdfsDirectoryToPoll=hdfs://remoteNode/path/to/data
sparkSubmitCommand=spark-submit spark-jar.jar
oozie.coord.application.path=${nameNode}/path/to/workflow
oozie.use.system.libpath=true
workflowPath=${nameNode}/path/to/workflow
startTime=2018-08-09T09:00Z
endTime=2018-08-10T09:00Z
但是我的问题是远程集群被kerberized,我在spark应用程序中执行了kinit,并且运行良好,但是我需要在协调器中执行相同的操作。
这是错误:
2020-04-14 14:18:51,437 ERROR CoordOldInputDependency:517 - SERVER[<server>] USER[-] GROUP[-] TOKEN[-] APP[-]637-oozie-oozi-C@1] org.apache.oozie.service.HadoopAccessorException: E0902: Exception occured: [org.apache.hadoop.ipc.RemoteExceptionrized connection for super-user: oozie/<server>@<realm> from IP <ip>]
org.apache.oozie.service.HadoopAccessorException: E0902: Exception occured: [org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.s super-user: oozie/<server>@<realm> from IP <ip>]
at org.apache.oozie.dependency.FSURIHandler.exists(FSURIHandler.java:113)
at org.apache.oozie.command.coord.CoordCommandUtils.pathExists(CoordCommandUtils.java:877)
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.pathExists(CoordOldInputDependency.java:220)
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.checkListOfPaths(CoordOldInputDependency.java:200)
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.checkPullMissingDependencies(CoordOldInputDependency.java:1
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.checkResolvedInput(CoordActionInputCheckXCommand.java:323)
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.execute(CoordActionInputCheckXCommand.java:173)
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.execute(CoordActionInputCheckXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:287)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unauthorized connectionm IP <ip>
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy31.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:82
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy32.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165)
at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442)
at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1454)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at org.apache.oozie.dependency.FSURIHandler.exists(FSURIHandler.java:101)
... 13 more
2020-04-14 14:18:51,442 ERROR CoordActionInputCheckXCommand:517 - SERVER[<server>] USER[-] GROUP[-] TOKEN[-] 163133637-oozie-oozi-C@1] XException,
org.apache.oozie.command.CommandException: E1021: Coord Action Input Check Error: org.apache.oozie.service.HadoopAccessorException: E0che.hadoop.security.authorize.AuthorizationException): Unauthorized connection for super-user: oozie/<server>
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.execute(CoordActionInputCheckXCommand.java:237)
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.execute(CoordActionInputCheckXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:287)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.oozie.service.HadoopAccessorException: E0902: Exception occured: [org.apache.hadoop.ipc.Remon): Unauthorized connection for super-user: oozie/<server>@<realm> from IP <ip>]
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.pathExists(CoordOldInputDependency.java:232)
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.checkListOfPaths(CoordOldInputDependency.java:200)
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.checkPullMissingDependencies(CoordOldInputDependency.java:1
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.checkResolvedInput(CoordActionInputCheckXCommand.java:323)
at org.apache.oozie.command.coord.CoordActionInputCheckXCommand.execute(CoordActionInputCheckXCommand.java:173)
... 7 more
Caused by: org.apache.oozie.service.HadoopAccessorException: E0902: Exception occured: [org.apache.hadoop.ipc.RemoteException(org.apacnection for super-user: oozie/<server>@<realm> from IP <ip>]
at org.apache.oozie.dependency.FSURIHandler.exists(FSURIHandler.java:113)
at org.apache.oozie.command.coord.CoordCommandUtils.pathExists(CoordCommandUtils.java:877)
at org.apache.oozie.coord.input.dependency.CoordOldInputDependency.pathExists(CoordOldInputDependency.java:220)
... 11 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unauthorized connectionm IP <ip>
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy31.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:82
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy32.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165)
at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442)
at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1454)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at org.apache.oozie.dependency.FSURIHandler.exists(FSURIHandler.java:101)
... 13 more
有什么建议吗?我们可以向协调员提供自定义的oozie-site.xml吗?