我想从谷歌数据流上运行的apache beam管道连接google cloud sql postgres实例。
我想用Python SDK做到这一点
我无法找到适当的文件。
在云SQL中如何指导我没有看到任何数据流文档
https://cloud.google.com/sql/docs/postgres/
有人可以提供文档链接/ github示例吗?
答案 0 :(得分:3)
您可以按以下方式使用beam-nuggets中的 relational_db.Write 和 relational_db.Read 转换:
首先安装梁形块:
pip install beam-nuggets
阅读:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from beam_nuggets.io import relational_db
with beam.Pipeline(options=PipelineOptions()) as p:
source_config = relational_db.SourceConfiguration(
drivername='postgresql+pg8000',
host='localhost',
port=5432,
username='postgres',
password='password',
database='calendar',
)
records = p | "Reading records from db" >> relational_db.Read(
source_config=source_config,
table_name='months',
)
records | 'Writing to stdout' >> beam.Map(print)
写作:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from beam_nuggets.io import relational_db
with beam.Pipeline(options=PipelineOptions()) as p:
months = p | "Reading month records" >> beam.Create([
{'name': 'Jan', 'num': 1},
{'name': 'Feb', 'num': 2},
])
source_config = relational_db.SourceConfiguration(
drivername='postgresql+pg8000',
host='localhost',
port=5432,
username='postgres',
password='password',
database='calendar',
create_if_missing=True,
)
table_config = relational_db.TableConfiguration(
name='months',
create_if_missing=True
)
months | 'Writing to DB' >> relational_db.Write(
source_config=source_config,
table_config=table_config
)
答案 1 :(得分:1)
Java SDK包含JdbcIO,允许连接到可通过标准Java JDBC机制访问的任何数据库。 Beam Python SDK目前没有模拟版本。如果有,我想它将使用Python DB-API。随意提交a feature request或贡献 - 开发应该相当简单(例如通过模仿Java JdbcIO
的源代码)但非常有用:)