Flink:无法取消正在运行的作业(流媒体)

时间:2016-10-19 09:22:32

标签: apache-flink flink-streaming

我想运行一个流媒体作业。
当我尝试使用start-clusted.sh和Flink Web界面在本地运行时,我没有问题。

但是,我目前正试图在YARN上使用Flink来完成我的工作 (部署在Google Dataproc上)当我尝试取消它时, 取消状态永远持续,并且一个插槽仍然占用 任务管理器。

这是我得到的日志:

2016-10-18 16:56:04,053 INFO org.apache.flink.runtime.taskmanager.Task - 
Attempting to cancel task Source: pubSubMessageAcknowledgingSource -> 
TrackingDisplayPushDeduplicater -> TrackingDisplayPushDeserializer -> 
(Sink: TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1)
2016-10-18 16:56:04,053 INFO org.apache.flink.runtime.taskmanager.Task - 
Source: pubSubMessageAcknowledgingSource -> 
TrackingDisplayPushDeduplicater -> TrackingDisplayPushDeserializer -> 
(Sink: TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1) switched to CANCELING
2016-10-18 16:56:04,053 INFO org.apache.flink.runtime.taskmanager.Task - 
Triggering cancellation of task code Source: 
pubSubMessageAcknowledgingSource -> TrackingDisplayPushDeduplicater -> 
TrackingDisplayPushDeserializer -> (Sink: 
TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1) (38bf32d9199a0c9383a8b1e8d73a1f65).
2016-10-18 16:56:34,055 WARN org.apache.flink.runtime.taskmanager.Task - 
Task 'Source: pubSubMessageAcknowledgingSource -> 
TrackingDisplayPushDeduplicater -> TrackingDisplayPushDeserializer -> 
(Sink: TrackingDisplayPushErrorFlumeSink, Map -> Sink: 
TrackingDisplayPushValidFlumeSink) (1/1)' did not react to cancelling 
signal, but is stuck in method:
java.net.PlainSocketImpl.socketConnect(Native Method)
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
java.net.Socket.connect(Socket.java:589)
java.net.Socket.connect(Socket.java:538)
sun.net.NetworkClient.doConnect(NetworkClient.java:180)
sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
sun.net.www.http.HttpClient.New(HttpClient.java:308)
sun.net.www.http.HttpClient.New(HttpClient.java:326)
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
com.accengage.bigdata.flink.streaming.sinks.FlumeSink.flush(FlumeSink.java:107)
com.accengage.bigdata.flink.streaming.sinks.FlumeSink.invoke(FlumeSink.java:80)
com.accengage.bigdata.flink.streaming.sinks.FlumeSink.invoke(FlumeSink.java:25)l
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:39)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:39)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:126)
org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:35)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:39)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:38)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:373)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:358)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:346)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:329)
org.apache.flink.streaming.api.operators.StreamSource$NonTimestampContext.collect(StreamSource.java:160)
com.accengage.bigdata.flink.streaming.sources.PubSubAcknowledgingSource.run(PubSubAcknowledgingSource.java:148)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:80)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:53)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
java.lang.Thread.run(Thread.java:745)

知道我做错了什么?
我该怎么办?

感谢。

1 个答案:

答案 0 :(得分:2)

我假设您使用的是自定义接收器(com.accengage.bigdata.flink.streaming.sinks.FlumeSink),它使用一些HTTP库与Flume进行通信。

最有可能的情况是,当中断发送到线程时,HTTP库会在循环中出现(例如,当忽略中断异常时会发生这种情况)

要解决此问题,您可以使用正确处理中断的HTTP库,也可以从不同的线程调用库,该线程不会在主线程上接收中断。

在Flink 1.2中,将有一些额外的机制来避免系统在cancel()调用中受到攻击。请参阅FLINK-4715