我有几个文件,我想读取这些文件,过滤一些关键字并将它们写入不同的文件。我使用Process(),结果证明需要更多时间来处理读写函数。 我需要将读写功能分开吗?如何一次读取多个文件并将不同文件中的关键字写入不同的csv?
非常感谢您。
def readwritevalue():
for file in gettxtpath(): ##gettxtpath will return a list of files
file1=file+".csv"
##Identify some variable
##Read the file
with open(file) as fp:
for line in fp:
#Process the data
data1=xxx
data2=xxx
....
##Write it to different files
with open(file1,"w") as fp1
print(data1,file=fp1 )
w = csv.writer(fp1)
writer.writerow(data2)
...
if __name__ == '__main__':
p = Process(target=readwritevalue)
t1 = time.time()
p.start()
p.join()
想编辑我的问题。我有更多函数可以修改readwritevalue()函数生成的csv。 因此,如果Pool.map()很好。可以像这样更改所有剩余功能吗?但是,这似乎并没有节省太多时间。
def getFormated(file): ##Merge each csv with a well-defined formatted csv and generate a final report with writing all the csv to one output csv
csvMerge('Format.csv',file,file1)
getResult()
if __name__=="__main__":
pool=Pool(2)
pool.map(readwritevalue,[file for file in gettxtpath()])
pool.map(GetFormated,[file for file in getcsvName()])
pool.map(Otherfunction,file_list)
t1=time.time()
pool.close()
pool.join()
答案 0 :(得分:0)
您可以将for
循环的主体提取到其自己的函数中,创建a multiprocessing.Pool
object,然后像这样调用pool.map()
(我使用了更多描述性名称):
import csv
import multiprocessing
def read_and_write_single_file(stem):
data = None
with open(stem, "r") as f:
# populate data somehow
csv_file = stem + ".csv"
with open(csv_file, "w", encoding="utf-8") as f:
w = csv.writer(f)
for row in data:
w.writerow(data)
if __name__ == "__main__":
pool = multiprocessing.Pool()
result = pool.map(read_and_write_single_file, get_list_of_files())
有关如何控制工作人员数量,每个工作人员的任务等,请参见链接的文档。
答案 1 :(得分:0)
我可能自己也找到了答案。不确定是否确实是一个好的答案,但是时间比以前缩短了6倍。
def readwritevalue(file):
with open(file, 'r', encoding='UTF-8') as fp:
##dataprocess
file1=file+".csv"
with open(file1,"w") as fp2:
##write data
if __name__=="__main__":
pool=Pool(processes=int(mp.cpu_count()*0.7))
pool.map(readwritevalue,[file for file in gettxtpath()])
t1=time.time()
pool.close()
pool.join()