我正在尝试在EMR群集中运行flink 1.5流作业。 EMR群集为M4.4x大,具有20个节点。我使用以下命令提交作业。程序死亡16分钟后,程序便可以在运行时产生输出。我发布了我可以收集的唯一例外,在taskmanager日志或jobmanager日志中没有其他例外。
我使用以下逻辑来计算内存和并行性。对于m4.4xlarge实例,其内存约为64GB,并具有16个vcpus。任务管理器的最大内存为16gb,因此我为每个任务管理器分配了14gb。每个节点的任务管理器数= 57mb / 16 = 3。我将这个乘以20得出emr集群中的节点数,总共得到60个。由于每个节点有3个任务管理器,并且有16个vpcus,因此插槽数= 16/3 =5。我设置了并行度如3 * 20 * 5 =300。感谢进一步调试此问题的任何输入。
flink run -m yarn-cluster -yn 60 -ys 5 -ytm 14336 -yjm 8096 --class myApp myApp.jar --kinesisStreamNames stream1-us-east-1-alfa,stream2-us-east-1-alfa --kinesisEndpointURL https://kinesis.us-east-1.amazonaws.com:443 --schemaRegion us-east-1 --schemaBucket smt-schemas --windowSizeInMillis 50000 --outputBasePath s3://smt-some-out/output --sinkType s3 --sourceType kinesis --awsAccessKey someAccess --awsSecretAccessKey someSecurity --parallelism 300 --enableCheckpointing false
org.apache.flink.client.program.ProgramInvocationException: org.apache.flink.util.FlinkException: Releasing TaskManager container_1532437716186_0004_01_000008.
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
at com.smaato.sessionizer.flink.stream.service.StreamSessionizationService.execute(StreamSessionizationService.scala:92)
at com.smaato.sessionizer.flink.stream.StreamSessionizationApp$.main(StreamSessionizationApp.scala:37)
at com.smaato.sessionizer.flink.stream.StreamSessionizationApp.main(StreamSessionizationApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:781)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:275)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020)
at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)
Caused by: org.apache.flink.util.FlinkException: Releasing TaskManager container_1532437716186_0004_01_000008.
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067)
at org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)