在pardo类别名称中使用args

时间:2019-11-18 05:25:33

标签: python apache-beam

是否可以在pardo(classname(args))中提供参数。因为我尝试过但是它说过程函数需要3个args,给定2个。我也尝试过 init 函数。请帮忙。如果可能,请让我知道如何重写此代码。 注意:splitcols和filtercols可以正常工作,因为没有给出args。

    import apache_beam as beam
    class splitcols(beam.DoFn):
      def process(self,elements):
        return [elements.split(',')]
    class filtercols(beam.DoFn):
      def process(self,elements):
        if elements[1]=='Drs.':
          return [elements]
    class addvals(beam.DoFn):
      def process(self,elements,a): #here I tried to accept the arg
        return [(elements[a],1)]
    p1 = beam.Pipeline()
    attendance_count = (
        p1
        |beam.io.ReadFromText('100Records.csv')
        |beam.ParDo(splitcols())
        |beam.ParDo(filtercols())
        |beam.ParDo(addvals(2))  #here I tried to give args
        #|beam.CombinePerKey(sum)
        #|beam.Map(lambda employee: str(employee))
        |beam.io.WriteToText('data/pardooutput')
    )
    p1.run()

1 个答案:

答案 0 :(得分:0)

您可以像在问题注释中一样将args传递给__init__,也可以将它们作为侧面输入传递给Pardo:beam.Pardo(addvals(), 2)

在此处查看类似的示例:https://beam.apache.org/documentation/programming-guide/#side-inputs