应用错误收集

使用Apache Beam从Google Cloud Datastore批量读取记录

时间：2019-04-11 17:42:11

标签： python google-cloud-datastore google-cloud-dataflow apache-beam

我正在使用Beam Beam在Beam自己的io.gcp.datastore.v1.datastoreio Python API的帮助下从Google Cloud Datastore读取数据。

我在Google Cloud Dataflow上运行管道。

我想确保我的工作人员不会过载数据。

我该如何批量读取数据或确保使用其他某种机制确保我的工作人员不会一次性提取大量数据？

1 个答案:

答案 0 :(得分：0)

Dataflow automatically does this for you. By default, datastoreio breaks your files into 64MB chucks. If you want to break them up into smaller pieces, use the num_splits parameter on the initializer to specify how many pieces to break each file into.

使用java从数据存储中删除100000条记录
使用＆＃34; DISTINCT＆＃34;使用Apache Beam Java SDK的DataStoreIO.read中的功能
如何使用批处理从DataFlow中的PubSub读取
BigQueryIO使用withTemplateCompatibility
从Beam中的另一条管道读取泡菜？
使用ValueProvider
Apache Beam TextIO.read然后组合成批
如何在云数据流python管道中读取多种数据存储类型
如何在不使用fromQuery（）方法的情况下使用Google云端数据流从Bigquery中仅读取所需记录？
使用Apache Beam从Google Cloud Datastore批量读取记录

我写了这段代码，但我无法理解我的错误
我无法从一个代码实例的列表中删除 None 值，但我可以在另一个实例中。为什么它适用于一个细分市场而不适用于另一个细分市场？
是否有可能使 loadstring 不可能等于打印？卢阿
java中的random.expovariate()
Appscript 通过会议在 Google 日历中发送电子邮件和创建活动
为什么我的 Onclick 箭头功能在 React 中不起作用？
在此代码中是否有使用“this”的替代方法？
在 SQL Server 和 PostgreSQL 上查询，我如何从第一个表获得第二个表的可视化
每千个数字得到
更新了城市边界 KML 文件的来源？