如何从pcollection向redshift表中写​​入多个值

时间:2019-07-16 05:48:53

标签: java jdbc amazon-redshift apache-beam

所以我有一个模板,可以将单个字符串写到redshift表中作为记录。

public static void main(String[] args) throws Exception {
        // Step 1: Create Options
        Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);

        // Step 2: Create Pipeline
        Pipeline pipeline = Pipeline.create(options);

        // Step 3: Create PCollection from array of random words <Strings>
        PCollection<String> collection = pipeline
                .apply(Create.of(Arrays.asList("start", "test", "case", "single", "end")))
                .setCoder(StringUtf8Coder.of());

        // Step 4: Execute transforms on the collection. This transform writes the string value to a table named 'test'
        collection.apply(JdbcIO.<String>write()
                .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration
                        .create("com.amazon.redshift.jdbc42.Driver", options.getRedshiftUrl())
                        .withUsername(options.getUser()).withPassword(options.getPassword()))
                .withStatement("insert into example_schema.test values (?)")
                .withPreparedStatementSetter(new JdbcIO.PreparedStatementSetter<String>() {
                    public void setParameters(String element, PreparedStatement query) throws SQLException {
                        query.setString(1, element);
                    }
                }));

        pipeline.run().waitUntilFinish();
    }

我想对此进行修改,以写入多个由整数,双精度和字符串组成的字段。

我发现我的方法存在很多问题,但是我觉得我可能会在没有完全了解流程的情况下随机尝试采用正确的实现方式

public static void main(String[] args) throws Exception {
        // Step 1: Create Options
        Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);

        String insertQuery = "insert into sample.mytable (item_int, item_string, item_double" +
                "values (?, ?, ?)";

        CustomObj custom_obj = new CustomObj(1, "", 0.5);

        // Step 2: Create Pipeline
        Pipeline pipeline = Pipeline.create(options);

        // Step 3: Create PCollection from array of random words <Strings>
        PCollection<CustomObj> collection = pipeline
                .apply(Create.of());

        // Step 4: Execute transforms on the collection. This transform writes the string value to a table named 'test'
        collection.apply(JdbcIO.<CustomObj>write()
                .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration
                        .create("com.amazon.redshift.jdbc42.Driver", options.getRedshiftUrl())
                        .withUsername(options.getUser()).withPassword(options.getPassword()))
                .withStatement(insertQuery)
                .withPreparedStatementSetter(new JdbcIO.PreparedStatementSetter<CustomObj>() {
                    public void setParameters(CustomObj element, PreparedStatement query) throws SQLException {
                        query.setInt(1, element.intVal);
                        query.setString(2, element.stringVal);
                        query.setDouble(3, element.doubleVal);
                    }
                }));
        pipeline.run().waitUntilFinish();
    }


    public static class CustomObj
    {
        private Integer intVal;
        private String stringVal;
        private Double doubleVal;

        public CustomObj (Integer intVal, String stringVal, Double doubleVal)
        {
            this.intVal = intVal;
            this.stringVal = stringVal;
            this.doubleVal = doubleVal;
        }
    }

到目前为止,我了解到我需要为我的PCollection设置合适的编码器,对于我使用的对象类型,我不确定。

我也没有正确使用PreparedStatementSetter,但是当我寻求明确性时,我得到的示例完全使用了不同的方法。

我知道我的问题可能有点含糊,但是如果我能定向到可以提供与我上面显示的方法有关的更多信息的资源,我将不胜感激。

产生的输出是

 no suitable method found for of(no arguments)
[ERROR]     method org.apache.beam.sdk.transforms.Create.<T>of(java.lang.Iterable<T>) is not applicable
[ERROR]       (cannot infer type-variable(s) T
[ERROR]         (actual and formal argument lists differ in length))
[ERROR]     method org.apache.beam.sdk.transforms.Create.<T>of(T,T...) is not applicable
[ERROR]       (cannot infer type-variable(s) T
[ERROR]         (actual and formal argument lists differ in length))
[ERROR]     method org.apache.beam.sdk.transforms.Create.<K,V>of(java.util.Map<K,V>) is not applicable
[ERROR]       (cannot infer type-variable(s) K,V
[ERROR]         (actual and formal argument lists differ in length))
[ERROR]
[ERROR] -> [Help 1]

1 个答案:

答案 0 :(得分:0)

该错误表明编译器无法选择Create.of()的正确重载。如果您查看Create的文档,那么没有带零参数的重载,则必须传递带有非可选第一个参数的iterable,map或varargs。您可能是说Create.of(custom_obj),它应该可以按您期望的那样工作(在这种情况下,它将创建一个包含单个元素的PCollection<CustomObj>)。

语句设置器也应按您的要求工作,这是一个执行相同操作的示例:https://github.com/apache/beam/blob/41478d00d34598e56471d99d0845ac16efa5b8ef/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L479