是否可以在pardo(classname(args))中提供参数。因为我尝试过但是它说过程函数需要3个args,给定2个。我也尝试过 init 函数。请帮忙。如果可能,请让我知道如何重写此代码。 注意:splitcols和filtercols可以正常工作,因为没有给出args。
import apache_beam as beam
class splitcols(beam.DoFn):
def process(self,elements):
return [elements.split(',')]
class filtercols(beam.DoFn):
def process(self,elements):
if elements[1]=='Drs.':
return [elements]
class addvals(beam.DoFn):
def process(self,elements,a): #here I tried to accept the arg
return [(elements[a],1)]
p1 = beam.Pipeline()
attendance_count = (
p1
|beam.io.ReadFromText('100Records.csv')
|beam.ParDo(splitcols())
|beam.ParDo(filtercols())
|beam.ParDo(addvals(2)) #here I tried to give args
#|beam.CombinePerKey(sum)
#|beam.Map(lambda employee: str(employee))
|beam.io.WriteToText('data/pardooutput')
)
p1.run()
答案 0 :(得分:0)
您可以像在问题注释中一样将args传递给__init__
,也可以将它们作为侧面输入传递给Pardo:beam.Pardo(addvals(), 2)
。
在此处查看类似的示例:https://beam.apache.org/documentation/programming-guide/#side-inputs