Question

我需要解析大约1000个网址。到目前为止，我有一个函数在解析URL后返回一个pandas数据帧。我应该如何最好地构建程序，以便将所有数据框组合在一起？我也不确定如何将论据归还给期货＆＃39;。在下面的示例中，我如何最终将所有临时数据帧合并到一个数据帧中（即finalDF = finalDF.append（temp）

import concurrent.futures

def Parser(ptf):
    temp=pd.DataFrame()
    URL="http://"+str(URL)
    #..some complex operations, including a requests.get(URL) which returns eventually a temp: a pandas dataframe
    return temp #returns a pandas dataframe

def conc_caller(ptf):
    temp=Parser(ptf)

    #this won't work because finalDF is not defined, unclear how to handle this
    finalDF= finalDF.append(temp)
    return df

booklist=['a','b','c']
finalDF=pd.DataFrame()        
executor = concurrent.futures.ProcessPoolExecutor(3)
futures = [executor.submit(conc_caller, item) for item in booklist]
concurrent.futures.wait(futures)

另一个问题是我收到错误消息：

 An attempt has been made to start a new process before the
 current process has finished its bootstrapping phase.

任何有关如何修复代码的建议都表示赞赏。

Answer 1

您必须使用if __name__ == '__main__':保护启动代码，以防止永久创建进程。就在concurrent.futures.wait(futures)

之前

ThreadPoolExecutor - 如何返回参数

1 个答案: