Question

我正在使用Apache Beam从Pub / Sub获取日志，其中包含页面浏览量流量信息。每个页面包含唯一的ID，并且当来自发布/订阅的浏览量日志记录来自Cloud时，Cloud Dataflow将以恒定的窗口方式收集它们并对其进行计数。在组合器结束时，我们将得到如下内容：

我知道，ParDo是用于通用并行处理的Beam转换。合并后，我希望实现一个转换，将查询写入Cloud Firestore以获取现有的综合浏览量ID，获取当前的浏览量，对其进行加法，并执行写入操作以从组合的输出中逐一更新浏览量，如图所示以上。有什么建议吗？

以下是我到目前为止针对UpdateViewCount的代码。当我收到查询时，似乎不可能有for循环来获取查询（由于综合浏览量是唯一的，因此它仅是查询的一行）

class UpdateIntoFireStore(beam.DoFn):
    def process(self, element):
        listingid, count = element
        doc_ref = db.collection('listings').where('listingid', u'==', '12345')
        try:
            docs = doc_ref.get()
            for doc in docs:
                print doc
        except NotFound:
            print(u'No such document!')

Answer 1

我解决了。无需进行循环来检索数据，我应该检索具有文档名称的特定ID。

doc_ref = db.collection(u'listings').document(listingid)
try:
    doc = doc_ref.get()
    doc_dict = doc.to_dict()
    self.cur_count = doc_dict[u'count']
    doc_ref.update({
        u'count': self.cur_count + count
    })
except NotFound:
    doc_ref.set({'count': count})

如何获取管道的输出并执行对Cloud Firestore的读写

1 个答案: