what kind of thread or how many thread should I use, if I doing the following?
我的问题是这样的:
执行该过程的线程1:只要row_1中的姓氏(base_file的第一行)可以在row_1-row_end(huge_file中的每一行)中匹配,写入line_1和row_1-row_end中的几行(如果有匹配)。
执行该过程的线程2:只要row_2中的姓氏(base_file的第一行)可以在row_1-row_end(huge_file中的每一行)中匹配,如果匹配,则写入line_2和row_1-row_end中的多行。
执行该过程的线程3:只要row_3中的姓氏(base_file的第一行)可以在row_1-row_end(huge_file中的每一行)中匹配,写入line_3和row_1-row_end中的几行(如果有匹配)。
........
执行该过程的线程100:只要row_100中的姓氏(base_file的第一行)可以在row_1-row_end(huge_file中的每一行)中匹配,写入line_100和row_1-row_end中的几行(如果有匹配)。
这100个或更多线程都是同时启动的。这可能吗?
答案 0 :(得分:0)
我有正常的代码来执行逐步的作业,这是一个嵌套的for循环但是我需要很长时间才能处理我的代码。
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import csv
import codecs
list1 = []
list2 = []
with open("Director_1980.csv", 'wt', newline = '') as f3:
writer = csv.writer(f3)
f2list = []
with open("contribDB_1980final.csv", 'rt') as f2: ## This file is a big file
reader = csv.reader(f2)
for row in reader:
f2list.append(row)
with codecs.open("director.csv", "r",encoding='utf-8', errors='ignore') as fdata:
for line in fdata:
line = line.split("|")
lName = line[5]
fName = line[1]
mName = line[2]
employer = line[6]
for row in f2list:
lName2 = row[7]
fName2 = row[8]
mName2 = row[9]
employer2 = row[20]
list1 = []
list2 = []
if fuzz.token_set_ratio(lName, lName2) == 100:
count2 = count2 + 1
print(count2)
#print(count2 )
lName_ratio = 100
fName_ratio = fuzz.token_set_ratio(fName, fName2)
mName_ratio = fuzz.token_set_ratio(mName, mName2)
employer_ratio = fuzz.token_set_ratio(employer, employer2)
new_line = line + row
new_line.insert(16, lName_ratio)
new_line.insert(18, fName_ratio)
new_line.insert(20, mName_ratio)
new_line.insert(32, employer_ratio)
writer.writerow(new_line)