Question

我正在尝试使用JdbcIO.Read在Java Beam中读取云SQL表。我想使用.withRowMapper（Resultset resultSet）方法将Resultset中的每一行转换为GenericData.Record。有没有一种方法可以将JSON模式字符串作为.withRowMapper方法中的输入传递，例如ParDo接受sideInputs作为PCollectionView

我尝试了两种读取操作（从同一JdbcIO.Read转换中的information_schema.columns和My Table读取）。但是，我想先生成Schema PCollection，然后使用JdbcIO.Read

读取表。

我正在这样动态生成表的Avro模式：

import numpy as np
with open('first file.txt', 'r') as t1:
    l1=[]
    for line in t1:
        split = line.split()
        l1.append((float(split[0]),float(split[1]),float(split[2]),float(split[3])))
l3=np.asarray(l1)

with open('second file.txt', 'r') as t2:
    l2=[]
    for line in t2:
        split = line.split()
        l2.append((float(split[0]),float(split[1]),float(split[2])))


with open('result file.txt', 'w') as outFile:
    for i in l3:        
        for j in l2:            

            if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:


                i[3]+=970000000

                #outFile.write(i)
                #print(i[3])
np.savetxt("result file.txt",l3,fmt='%7.4f'*3+'%10.3f')

创建PCollectionView来保存每个表的json模式。

PCollection<String> avroSchema= pipeline.apply(JdbcIO.<String>read()
                .withDataSourceConfiguration(config)
                .withCoder(StringUtf8Coder.of())
                .withQuery("SELECT DISTINCT column_name, data_type \n" +
                        "FROM information_schema.columns\n" +
                        "WHERE table_name = " + "'" + tableName + "'")
                .withRowMapper((JdbcIO.RowMapper<String>) resultSet -> {
            // code here to generate avro schema string
           // this works fine for me

}))

有没有更好的方法来解决这个问题？

Answer 1

目前，IOs API不接受SideInputs。

在读取后立即添加ParDo并在那里进行映射应该是可行的。该ParDo可以接受侧面输入。

如何将侧面输入/额外输入传递给JdbcIO RowMapper Java

1 个答案: