我一直在尝试读取一个大文件并在处理输入文件中的数据后同时写入另一个文件,该文件相当大,大约4-8 GB,有没有办法将进程并行化节省时间
原始程序是:
with open(infile,"r") as filein:
with open(writefile,"w") as filewrite:
with open(errorfile,"w") as fileerror:
line=filein.readline()
count=0
filewrite.write("Time,Request,IP,MAC\n")
while line:
count+=1
line=filein.readline()
#print "{}: {}".format(count,line.strip()) testing content
if requestp.search(line):
filewrite.write(line.strip()[:15]+",")
filewrite.write(requestp.search(line).group()+",")
if IP.search(line):
filewrite.write(IP.search(line).group())
filewrite.write(",")
if MACp.search(line):
filewrite.write(MACp.search(line).group())
filewrite.write("\n")
else:
fileerror.write(line)
但这需要花费太多时间来处理这样的文件而且我有100个这样的文件, 我尝试使用Ipyparellel来平衡代码,但还没有成功,有没有办法做到这一点。