Jupyter Notebook中的Python多重处理不起作用

时间:2019-05-28 21:18:34

标签: python multiprocessing jupyter-notebook

我是Python multiprocessing模块的新手,可以使用Jupyter笔记本。 当我尝试运行以下代码时,我不断得到AttributeError: Can't get attribute 'load' on <module '__main__' (built-in)>

当我运行文件时,没有输出,它只是继续加载。

import pandas as pd
import datetime
import urllib
import requests
from pprint import pprint
import time
from io import StringIO
from multiprocessing import Process, Pool

symbols = ['AAP']

start = time.time()
dflist = []

def load(date):
    if date is None:
        return
    url = "http://regsho.finra.org/FNYXshvol{}.txt".format(date)
    try:
        df = pd.read_csv(url,delimiter='|')
        if any(df['Symbol'].isin(symbols)):
            stocks = df[df['Symbol'].isin(symbols)]
            print(stocks.to_string(index=False, header=False))
            # Save stocks to mysql
        else:
            print(f'No stock found for {date}' )
    except urllib.error.HTTPError:
        pass

pool = []
numdays = 365
start_date = datetime.datetime(2019, 1, 15 )  #year - month - day
datelist = [
        (start_date - datetime.timedelta(days=x)).strftime('%Y%m%d') for x in range(0, numdays)
        ]

pool = Pool(processes=16)
pool.map(load, datelist)

pool.close()
pool.join()

print(time.time() - start)

我该怎么办才能直接从笔记本上运行此代码而没有问题?

1 个答案:

答案 0 :(得分:0)

一种方法:
1.获得load函数并创建例如worker.py
2. import workerworker.load
3.

from multiprocessing import Pool
import worker
if __name__ ==  '__main__': 
  pool = []
  numdays = 365
  start_date = datetime.datetime(2019, 1, 15 )  #year - month - day
  datelist = [
        (start_date - datetime.timedelta(days=x)).strftime('%Y%m%d') for x in 
        range(0, numdays)
        ]

  pool = Pool(processes=16)
  pool.map(worker.load, datelist)

  pool.close()
  pool.join()