从流作业

时间:2018-06-02 16:05:47

标签: java apache-flink flink-streaming

您好我有一个关于Flink流处理的maven项目。根据我从流中获得的消息,我开始批处理,但目前我收到错误。

我对这个眨眼的世界很陌生,如果您有任何想法,请告诉我。以下是我用来启动独立群集的代码。

        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment ( );

    KafkaConsumerService kafkaConsumerService= new KafkaConsumerService();
    FlinkKafkaConsumer010<String> kafkaConsumer = kafkaConsumerService.getKafkaConsumer(settings );
    DataStream<String> messageStream = env.addSource (kafkaConsumer).setParallelism (3);

    messageStream
            .filter(new MyFilter()).setParallelism(3).name("Filter")
            .map(new ProcessFile(arg)).setParallelism(3).name("start batch")
            .addSink(new DiscardingSink()).setParallelism(3).name("DiscardData");

    env.execute("Stream processor");

// ProcessFile地图类

    public ProcessFile(String arg) { }

@Override
public String map(String message) throws Exception {
    MessageType typedmessage = ParseMessage(message);
    if (isWhatIwant()) {
        String[] batchArgs = createBatchArgs();
                    Configuration config = new Configuration();
        config.setString(JobManagerOptions.ADDRESS, jobMasterHost);
        config.setInteger(JobManagerOptions.PORT, jobMasterPort);

        StandaloneClusterClient client = new StandaloneClusterClient(config);
        client.setDetached(true);
        PackagedProgram program = new PackagedProgram(new File(jarLocation), SupplyBatchJob.class.getName(), batchArgs);
        client.run(program, 7);
    }

    return typedmessage;
}

从作业管理器Web门户复制错误。我得到的错误:org.apache.flink.client.program.ProgramInvocationException:无法检索JobManager网关。     在org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:497)     在org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:103)     在org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)     在org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)     在org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)     at cw.supply.data.parser.maps.ProcessFileMessage.map(ProcessFileMessage.java:47)     at cw.supply.data.parser.maps.ProcessFileMessage.map(ProcessFileMessage.java:25)     在org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)     在org.apache.flink.streaming.runtime.tasks.OperatorChain $ CopyingChainingOutput.pushToOperator(OperatorChain.java:528)     在org.apache.flink.streaming.runtime.tasks.OperatorChain $ CopyingChainingOutput.collect(OperatorChain.java:503)     在org.apache.flink.streaming.runtime.tasks.OperatorChain $ CopyingChainingOutput.collect(OperatorChain.java:483)     在org.apache.flink.streaming.api.operators.AbstractStreamOperator $ CountingOutput.collect(AbstractStreamOperator.java:891)     at org.apache.flink.streaming.api.operators.AbstractStreamOperator $ CountingOutput.collect(AbstractStreamOperator.java:869)     在org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)     在org.apache.flink.streaming.runtime.tasks.OperatorChain $ CopyingChainingOutput.pushToOperator(OperatorChain.java:528)     在org.apache.flink.streaming.runtime.tasks.OperatorChain $ CopyingChainingOutput.collect(OperatorChain.java:503)     在org.apache.flink.streaming.runtime.tasks.OperatorChain $ CopyingChainingOutput.collect(OperatorChain.java:483)     在org.apache.flink.streaming.api.operators.AbstractStreamOperator $ CountingOutput.collect(AbstractStreamOperator.java:891)     at org.apache.flink.streaming.api.operators.AbstractStreamOperator $ CountingOutput.collect(AbstractStreamOperator.java:869)     at org.apache.flink.streaming.api.operators.StreamSourceContexts $ NonTimestampContext.collect(StreamSourceContexts.java:103)     at org.apache.flink.streaming.api.operators.StreamSourceContexts $ NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:110)     在org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:269)     在org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:86)     在org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)     在org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)     在org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)     在org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)     在org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)     在org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)     在org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)     在java.lang.Thread.run(Thread.java:748)     引起:org.apache.flink.util.FlinkException:无法连接到前导JobManager。请检查JobManager是否正在运行。     在org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:789)     在org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:495)     ......还有30多个      引起:org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException:无法检索领导者网关。     在org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)     在org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:784)     ......还有31个       引起:java.util.concurrent.TimeoutException:期货在[10000毫秒]后超时     在scala.concurrent.impl.Promise $ DefaultPromise.ready(Promise.scala:219)     在scala.concurrent.impl.Promise $ DefaultPromise.result(Promise.scala:223)     在scala.concurrent.Await $$ anonfun $ result $ 1.apply(package.scala:190)     at scala.concurrent.BlockContext $ DefaultBlockContext $ .blockOn(BlockContext.scala:53)     在scala.concurrent.Await $ .result(package.scala:190)     在scala.concurrent.Await.result(package.scala)     在org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:77)     ......还有32个

1 个答案:

答案 0 :(得分:1)

I figured what the issue is after getting access to the environment I verified. I was using the public address of the JobManager where the port is not open. Instead I started using the private IP since all nodes are in the same subnet and no need of opening the port to the world. Hope this helps someone else too.