为关键字参数'inputLocForTrain'获取了多个值

时间:2018-11-10 09:30:51

标签: python multiprocessing

我在课堂上定义了一个函数

class X:

    def __init__(self, logger, tableDataLoader, dataCleanser, timeSeriesFunctions):
        self.logger = logger
        self.tableDataLoader = tableDataLoader
        self.dataCleanser = dataCleanser
        self.timeSeriesFunctions = timeSeriesFunctions

    def preProcess(self, inputLocForTrain, inputLocForTest, outputLoc, region, gl):

        # Do Something

我正试图通过这样定义的多处理类来调用此函数preProcess

class ProcessManager:

    def __init__(self, spark, logger):
        self.spark = spark
        self.logger = logger

    def applyMultiProcessExecution(self, func_arguments, targetFunction, iterableList):

        self.logger.info("Function Arguments : {}".format(func_arguments))
        jobs = []
        for x in iterableList:
            try:
                p = Process(target=targetFunction, args=(x,), kwargs=func_arguments)
                jobs.append(p)
                p.start()
            except:
                raise RuntimeError("Unable to create process for GL : {}".format(x))

        for job in jobs:
            job.join()

现在我这样叫我的ProcessManager

processManager = ProcessManager(spark=spark, logger=logger)
dataFetcherFactory = DataFetcherFactory(logger)
dataFetcher = dataFetcherFactory.getDataFetcher(pipelineType=pipelineType)
dataCleanser = DataCleanser(logger)
timeSeriesFunctions = TimeSeriesFunctions(logger)
tableDataLoader = TableDataLoader(logger=logger, dataFetcher=dataFetcher, dataCleanser=dataCleanser,
                         timeSeriesFunctions=timeSeriesFunctions)
preProcessDataForPCAModel = X(logger=logger,
                                                          tableDataLoader=tableDataLoader,
                                                          dataCleanser=dataCleanser,
                                                          timeSeriesFunctions=timeSeriesFunctions)
arguments = {FeatureConstants.INPUT_LOCATION_FOR_TRAIN: inputLocForTrain,
                 FeatureConstants.INPUT_LOCATION_FOR_TEST: inputLocForTest,
                 FeatureConstants.OUTPUT_LOCATION: outputLoc,
                 REGION: region}

processManager.applyMultiProcessExecution(func_arguments=arguments,
                              targetFunction=preProcessDataForPCAModel.preProcess,
                              iterableList=[504])

这返回我错误: 流程1:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
TypeError: preProcess() got multiple values for keyword argument 'inputLocForTrain'

我浏览了几篇stackoverflow帖子,人们认为这是由于自变量作为类的一部分而出现的。我无法理解如何解决我的问题,因为我需要将构造函数参数作为自身的一部分出现才能进行计算。

有人可以让我知道如何解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

尝试更改:

def preProcess(self, inputLocForTrain, inputLocForTest, outputLoc, region, gl):

收件人:

def preProcess(self, gl, inputLocForTrain, inputLocForTest, outputLoc, region):

positional argument should appear at the beginning