我编写了一个要在使用Apache-Beam的管道中使用的函数(我很陌生)。该函数接收字典,将字典的键与键列表进行比较,并仅返回包含这些键(及其各自值)的新字典。功能如下:
wanted_keys = ["foo", "bar", "baz"]
def reduced_dicts(line):
if not all(elem in [*line] for elem in wanted_items):
logging.error("Missing values in dictionary. The following items are missing: " + str(set(wanted_items) - set([*line])))
return
new_Dict = dict()
for (key, value) in line.items():
if key in wanted_items:
new_Dict[key] = value
return new_Dict
现在,如果我运行以下测试管道,则它运行良好,并且结果良好:
with TestPipeline() as p:
lines = p | beam.Create([{"foo": 1, "qux": 2, "bar": 3, "baz": 4}])
results = lines | "reducing dictionary" >> beam.Map(reduced_dicts)
assert_that(results, equal_to(
[{"foo": 1, "bar": 3, "baz": 4}]))
问题是,我没有一对一地映射字典,因此我想使用beam.ParDo
函数而不是beam.Map
。如果仅将其更改为beam.ParDo
,则函数的输出将成为键列表,而不是字典。如果我将return
语句更改为yield
语句,则可以正常运行,但是我不知道为什么。有人可以帮忙吗?