从Google Dataflow连接到MySQL

时间:2018-05-04 17:26:21

标签: mysql google-cloud-dataflow

我正在尝试从Google Dataflow连接到AWS RDS MySQL实例。我创建了一个java程序来创建管道。作业成功创建,但MySQL连接始终失败,并出现以下错误:

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at com.google.cloud.dataflow.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:338)
    at com.google.cloud.dataflow.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:308)
    at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
    at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
    at com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
    at com.google.cloud.dataflow.worker.MapTaskExecutorFactory.create(MapTaskExecutorFactory.java:154)
    at com.google.cloud.dataflow.worker.DataflowWorker.doWork(DataflowWorker.java:308)
    at com.google.cloud.dataflow.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:264)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:133)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:113)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:100)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.beam.sdk.util.UserCodeException: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn$DoFnInvoker.invokeSetup(Unknown Source)
    at com.google.cloud.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:63)
    at com.google.cloud.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:45)
    at com.google.cloud.dataflow.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:94)
    at com.google.cloud.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:74)
    at com.google.cloud.dataflow.worker.MapTaskExecutorFactory.createParDoOperation(MapTaskExecutorFactory.java:415)
    at com.google.cloud.dataflow.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:326)
    ... 14 more
Caused by: com.mysql.cj.jdbc.exceptions.CommunicationsException:
Communications link failure
Caused by: java.net.SocketTimeoutException: connect timed out

JAVA源代码如下:

public class MySQLToBQ {
    public static void main(String[] args) throws Exception {
        DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
        options.setProject("project_name");
        options.setStagingLocation("gs://staging");
        options.setTempLocation("gs://temp");
        options.setRunner(DataflowRunner.class);
        options.setJobName("MySQL-To-BQ-" + new SimpleDateFormat("yyyyMMdd-HHmmss").format(new Date()));
        System.out.println("Job Name " + options.getJobName());
        Pipeline p = Pipeline.create(options);

        DataSourceConfiguration mysqlConfig = JdbcIO.DataSourceConfiguration.create(
                "com.mysql.cj.jdbc.Driver", "jdbc:mysql://mysql_host:3306/mysql_database")
                .withUsername("user")
                .withPassword("password");

        p.apply("mysql_source", JdbcIO.<SourceRow>read()
            .withDataSourceConfiguration(mysqlConfig)
            .withQuery("query")
            .withCoder(SerializableCoder.of(SourceRow.class))
            .withRowMapper(new JdbcIO.RowMapper<SourceRow>() {
                    @Override
                    public SourceRow mapRow(ResultSet resultSet) throws Exception {
                        SourceRow datarow = new SourceRow();
                        ResultSetMetaData rsmd = resultSet.getMetaData();
                        for(int i = 1; i <= rsmd.getColumnCount(); i++) {
                            datarow.add(rsmd.getColumnName(i), resultSet.getString(i));
                        }
                    return datarow;
                    }
                }
            )
        )
        .apply(table + "_transform", ParDo.of(new TransformToTableRow()))
        .apply(table + "_destination", BigQueryIO.writeTableRows()
            .to("table_name")
            .withSchema(getSchema())
            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
        );

        p.run();
    }
}

我能够创建一个Compute Engine VM实例并从那里成功连接到MySQL数据库。

2 个答案:

答案 0 :(得分:0)

在Dataflow上,您不能将IP列入白名单以使Dataflow能够访问SQL实例。我不确定适用于AWS RDS,但不确定适用于Cloud SQL,因此您应该使用JDBC套接字工厂代替https://cloud.google.com/sql/docs/mysql/connect-external-app#java

答案 1 :(得分:0)

对于Java,您可以使用公共访问权限并使用以下命令:https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory