Question

我正在python中进行一些简单的数据分析，一次超过500,000个文件。每个文件的分析大约需要1秒钟。我的代码工作正常，但是可以利用python中的并行计算来加快速度。数据大小不是问题，因为每个文件的大小为100Kb。它是在for循环中读/写文件，占用时间。这是我的代码的样子。

import pandas as pd
import numpy as np
from math import acos,sqrt,degrees,asin
import os
wd='C:/My_Folder/'
filelist=os.listdir(wd) # List of names of 500,000 files
for i in range(len(filelist))
    dt=pd.read_csv(wd+filelist[i]) # This line reads a csv file into pandas data frame
    newdata=myfunction(dt)         # This function does some simple manipulation on the data frame and returns the new data frame
    newdata.to_csv('C:/Output/'+filelist[i]) # This writes the new data frame into new csv file

代码工作正常，大约需要一秒钟来操作每个文件。但我看到我的计算机只使用一个核心，我可以通过实现并行性来加快速度吗

在python pandas中并行分析数据的方法

0 个答案: