我想在编写数据框之前执行轻量级验证。在编写之前,我必须通过“ foo”序列化数据帧。我在“ foo”中增加一个累加器:
<svg width="400" height="120" viewBox="0 0 1000 300">
<defs>
<path id="MyPath1"
d="M 100 200
C 200 100 300 0 400 100
C 500 200 600 300 700 200
C 800 100 900 100 900 100"/>
<path id="MyPath2" d="M300,300L700,50"/>
</defs>
<!-- red line under text. You can delete the following line -->
<use href="#MyPath1" fill="none" stroke="red"/>
<text font-family="Verdana" font-size="42.5">
<textPath href="#MyPath1">We go up, then we go down, then up again</textPath>
<textPath href="#MyPath2" fill="red">And the second text</textPath>
</text>
问题在于acc = sc.accumulator(0)
output = df.map(foo)
if acc.value < THRESHOLD:
raise ValueError(f"Failed validation: {acc.value} < {THRESHOLD}")
output.write(path)
,因为显然直到acc.value == 0
才对累加器进行求值,我想避免这种情况,因为数据验证失败。正确的设计模式是什么?
答案 0 :(得分:1)
如果您的目标是在将数据发布到某个输出路径之前验证计数,只需将数据写入中间路径即可。然后评估累加器计数器,如果计数有效,则将中间路径重命名为实际输出目标。
acc = sc.accumulator(0)
output = df.map(foo)
output.write(tmp_path)
if acc.value < THRESHOLD:
# fs.delete(tmp_path)
raise ValueError(f"Failed validation: {acc.value} < {THRESHOLD}")
else fs.rename(tmp_path, path)