Question

我编写了一个要在使用Apache-Beam的管道中使用的函数（我很陌生）。该函数接收字典，将字典的键与键列表进行比较，并仅返回包含这些键（及其各自值）的新字典。功能如下：

wanted_keys = ["foo", "bar", "baz"] 
def reduced_dicts(line):
    if not all(elem in [*line]  for elem in wanted_items):
        logging.error("Missing values in dictionary. The following items are missing: " + str(set(wanted_items) - set([*line])))
        return
    new_Dict = dict()
    for (key, value) in line.items():
        if key in wanted_items:
            new_Dict[key] = value
    return new_Dict

现在，如果我运行以下测试管道，则它运行良好，并且结果良好：

with TestPipeline() as p:
            lines = p | beam.Create([{"foo": 1, "qux": 2, "bar": 3, "baz": 4}])
            results = lines | "reducing dictionary" >> beam.Map(reduced_dicts)

            assert_that(results, equal_to(
                [{"foo": 1, "bar": 3, "baz": 4}]))

问题是，我没有一对一地映射字典，因此我想使用beam.ParDo函数而不是beam.Map。如果仅将其更改为beam.ParDo，则函数的输出将成为键列表，而不是字典。如果我将return语句更改为yield语句，则可以正常运行，但是我不知道为什么。有人可以帮忙吗？

使用ParDo函数更改字典的输出

0 个答案: