如何在数据流管道作业之后执行标准java脚本作业,打包在servlet中?

时间:2017-08-03 12:56:53

标签: java google-app-engine google-cloud-dataflow

我在一个servlet中打包了一个Dataflow作业(一个在BlockingDataflowPipelineRunner模式下运行的runnable,每天由一个CRON作业触发(在AppEngine中)。

当管道完成时,我无法设法执行某些Java脚本。它在使用Jetty本地启动时有效,但在使用AppEngine部署时则无效。

执行此类任务的最佳方式是什么?

编辑:这是我得到的错误。基本上我尝试将第一个数据吸收到云存储中,然后执行BigQuery补丁,然后加载标准的Java代码。

在日志记录中,这是我在Dataflow Logs之后可以找到的内容(虽然我看不到任何“Stopping dataflow workers”日志,这很奇怪......):

javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:992) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153) at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93) at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) at com.google.cloud.dataflow.sdk.runners.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:351) at com.google.cloud.dataflow.sdk.runners.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:323) at com.google.cloud.dataflow.sdk.runners.DataflowPipelineJob.waitToFinish(DataflowPipelineJob.java:236) at com.google.cloud.dataflow.sdk.runners.DataflowPipelineJob.waitToFinish(DataflowPipelineJob.java:191) at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.run(BlockingDataflowPipelineRunner.java:117) at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.run(BlockingDataflowPipelineRunner.java:56) at ... at java.lang.Thread.run(Thread.java:748) Caused by: java.io.EOFException: SSL peer shut down incorrectly at sun.security.ssl.InputRecord.read(InputRecord.java:505) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ... 20 more

有什么想法吗?

由于

1 个答案:

答案 0 :(得分:0)

这个异常似乎是servlet超时的。我认为HTTPRequest只能在超时发生之前保持打开几分钟。因此,我认为在servlet代码中运行数据流作业后,阻止它是安全的。

你能启动进程并等待servlet中的另一个线程吗?然后构建某种机制让客户端继续轮询servlet,直到数据流作业完成。 (您可能需要让客户端存储标识符/令牌以标识正在运行的作业)。

或者,您可以使用DataflowPipelineRunner(不阻止),并在HTTPRequest上返回作业ID。

然后,您可以继续轮询servlet直到作业完成,通过使用DataflowClient,您可以继续检查作业,直到其状态不再为RUNNING,即FAILED或SUCCESS。

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineDebugOptions#getDataflowClient--

在给定作业ID的情况下,可以使用DataflowClient中的getjob调用。 https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L89