使用ParDo函数更改字典的输出

时间:2019-12-12 08:39:52

标签: python dictionary apache-beam

我编写了一个要在使用Apache-Beam的管道中使用的函数(我很陌生)。该函数接收字典,将字典的键与键列表进行比较,并仅返回包含这些键(及其各自值)的新字典。功能如下:

wanted_keys = ["foo", "bar", "baz"] 
def reduced_dicts(line):
    if not all(elem in [*line]  for elem in wanted_items):
        logging.error("Missing values in dictionary. The following items are missing: " + str(set(wanted_items) - set([*line])))
        return
    new_Dict = dict()
    for (key, value) in line.items():
        if key in wanted_items:
            new_Dict[key] = value
    return new_Dict

现在,如果我运行以下测试管道,则它运行良好,并且结果良好:

with TestPipeline() as p:
            lines = p | beam.Create([{"foo": 1, "qux": 2, "bar": 3, "baz": 4}])
            results = lines | "reducing dictionary" >> beam.Map(reduced_dicts)

            assert_that(results, equal_to(
                [{"foo": 1, "bar": 3, "baz": 4}]))

问题是,我没有一对一地映射字典,因此我想使用beam.ParDo函数而不是beam.Map。如果仅将其更改为beam.ParDo,则函数的输出将成为键列表,而不是字典。如果我将return语句更改为yield语句,则可以正常运行,但是我不知道为什么。有人可以帮忙吗?

0 个答案:

没有答案