Question

我已获得以下代码：

@classmethod
def load(self):
    with open('yaml/instruments.yaml', 'r') as ymlfile:
        return {v['name']: Instrument(**v) for (v) in list(yaml.load_all(ymlfile))}

我希望使用以下内容并行加载：

return ThreadPoolExecutor.map(Instrument, list(yaml.load_all(ymlfile))

但我不太确定如何让参数通过。

这是instruments.yaml：

的一个例子

---
    name: 'corn'
    #Trade December corn only
    multiplier: 5000
    contract_prefix: 'C'
    months_traded: [3, 5, 7, 9, 12]
    quandl: 'CHRIS/CME_C2'
    first_contract: 196003
    backtest_from: 199312
    trade_only: [12]
    contract_name_prefix: 'C'
    quandl_database: 'CME'
    slippage: 0.125 #half the spread
    ib_code: 'ZC'

如何将我的代码重构为地图，以便我可以使用ThreadPoolExecutor？

Answer 1

简单的解决方案是定义在执行程序中使用的顶级简单工作器函数：

def make_instrument_pair(d):
    return d['name'], Instrument(**d)

然后改变：

@classmethod
def load(self):
    with open('yaml/instruments.yaml', 'r') as ymlfile:
        return {v['name']: Instrument(**v) for (v) in list(yaml.load_all(ymlfile))}

为：

@classmethod
def load(self):
    with open('yaml/instruments.yaml') as ymlfile,\
         concurrent.futures.ThreadPoolExecutor(8) as executor:
        return dict(executor.map(make_instrument_pair, yaml.load_all(ymlfile)))

正如我在评论中指出的那样，这可能不会让你获得任何好处; the GIL表示线程不会改善效果，除非：

这项工作是在第三方C扩展中完成的，它在执行大量C级工作之前明确释放GIL
这项工作主要是I / O限制（或以其他方式花费大部分时间以某种方式阻止，无论是睡觉，等待锁定等等）。

除非Instrument构建起来非常昂贵，否则即使使用ProcessPoolExecutor也可能无法提供帮助;你需要在调度的任务中做大量的工作，或者你在任务管理（以及进程，序列化和进程间通信）上浪费的时间比并行性要多。

将迭代转换为映射

1 个答案: