应用错误收集

我有很多Excel文件的文件夹，我需要将其读取为DataFrame。每个文件的大小约为100-300 Mb，读取单个文件需要花费几分钟。

读取这些文件时，我的CPU仅使用1个内核（共8个）。

如何并行阅读它们？我写了

def convert_file_to_pickled_df(path, fname):
    df = pd.read_excel((path+fname))
    # do some other things
    return df

path = 'D:/'
filenames=['2017_1.xlsx', '2017_2.xlsx', '2017_3.xlsx', '2017_4.xlsx']

pool = mp.Pool(mp.cpu_count())
results = [pool.apply(convert_file_to_pickled_df, args=(path, fname)) for fname in filenames]     
pool.close()

但它似乎不起作用。我仍然只加载了1个核心。

如何并行读取多个.xls文件作为DataFrames？

0 个答案: