为什么多重处理无法与python dash框架一起使用-Python3.6

时间:2019-06-15 09:23:33

标签: python-3.x multiprocessing plotly-dash

我正在尝试实现多处理库,以将数据帧分为多个部分,在CPU的多个内核上进行处理,然后将结果连接回python dash应用程序中的最终数据帧。当我在破折号应用程序之外尝试代码时(当我独立运行代码而不将其包含在破折号应用程序中时),代码可以正常工作。但是,当我在破折号应用程序中包含相同的代码时,会出现错误。我已经显示了下面的代码:

我已经尝试了破折号框架中的多处理代码,它的工作原理绝对不错。

import dash
from dash.dependencies import Input, Output, State
import dash_core_components as dcc
import dash_html_components as html
import flask
import dash_table_experiments as dt
import dash_table
import dash.dependencies

import base64
import time
import os

import pandas as pd

from docx import *
from docx.text.paragraph import Paragraph
from docx.text.paragraph import Run
import xml.etree.ElementTree as ET


import multiprocessing as mp
from multiprocessing import Pool

from docx.document import Document as doctwo
from docx.oxml.table import CT_Tbl
from docx.oxml.text.paragraph import CT_P
from docx.table import _Cell, Table
from docx.text.paragraph import Paragraph
import io
import csv
import codecs
import numpy as np

app = dash.Dash(__name__)
application = app.server
app.config.supress_callback_exceptions = True


app.layout = html.Div(children=[

    html.Div([
            html.Div([

                html.H4(children='Reader'),
                html.Br(),

            ],style={'text-align':'center'}),
            html.Br(),
            html.Br(),
            html.Div([

                dcc.Upload(html.Button('Upload File'),id='upload-data',style = dict(display = 'inline-block')),
                html.Br(),
            ]

            ),  
    html.Div(id='output-data-upload'),          

    ])


        ])


@app.callback(Output('output-data-upload', 'children'),
              [Input('upload-data', 'contents')],
              [State('upload-data', 'filename')])
def update_output(contents, filename):
    if contents is not None:
        content_type, content_string = contents.split(',')
        decoded = base64.b64decode(content_string)
        document = Document(io.BytesIO(decoded))

        combined_df = pd.read_csv('combined_df.csv')

        def calc_tfidf(input1): 
            input1 = input1.reset_index(drop=True)
            input1['samplecol'] = 'sample'
            return input1


        num_cores = mp.cpu_count() - 1   #number of cores on your machine
        num_partitions = mp.cpu_count() - 1 #number of partitions to split dataframe
        df_split = np.array_split(combined_df, num_partitions)
        pool = Pool(num_cores)
        df = pd.concat(pool.map(calc_tfidf, df_split))
        pool.close()
        pool.join()   

        return len(combined_df)

    else:
        return 'No File uploaded'

app.css.append_css({'external_url': 'https://codepen.io/plotly/pen/EQZeaW.css'})

if __name__ == '__main__':

    app.run_server(debug=True)

以上破折号应用程序将任何文件作为输入。在前端上传文件后,本地CSV文件(任何文件,在我的情况下为combined_df.csv)都被加载到数据框中。现在,我想使用多重处理将数据框分成多个部分,进行处理并将其组合回去。但是上面的代码导致以下错误:

  

AttributeError:无法腌制本地对象'update_output..calc_tfidf'

这段代码有什么问题?

1 个答案:

答案 0 :(得分:0)

好吧,我现在已经知道了!问题在于函数calc_tfidf没有定义为全局函数。我将该函数更改为全局函数,并且效果很好。