我需要多线程概念的帮助。我不太了解这个概念。你能帮我做到吗?
当前方案:
Job seq_num
A 1
B 1
C 2
D 2
在上面的表中,您可以看到我们根据Dependency.seq_num划分了两个作业,这些作业取决于seq_num-1作业。这意味着将首先触发seq_num 1,然后触发2。各个seq_num中的所有作业将以并行方式运行。
Suppose Job A = 10 mins
Job B = 15 mins
因此seq_num -1的总完成时间为15分钟。 15分钟后。 seq_num = 2将开始。
理想情况:
Job Job_Type seq_num
A independent 1
B independent 1
C A 2
D B 2
在上表中,您可以看到作业C依赖于作业A,而作业D依赖于B。在以前的场景中,seq_num是在seq_num = 1完成后开始的。但是在这种情况下,我想要不同的方法。
Job A = 10 mins
Job B = 15 mins
作业C依赖于A,因此在完成作业A之后,作业C将立即开始。它不应该等待作业B的完成。 像这样D依赖于B,因此在完成作业B之后,作业D将立即开始。
当前,我正在使用多线程概念根据seq_num运行整个过程。但我想要不依赖seq_num的理想情况。如何等待依赖进程直到其父进程完成? 我也在分享我的代码。请让我知道我需要在哪里更改代码以获得理想的方案。如果需要更多信息,请告诉我。
代码:
def parallel_Execution():
logging.info("parallel_Execution..................[started]")
par_temp_loc = '/medaff/Temp/'
'''Reading the metadata file and creating as a dataframe'''
df = pd.read_csv(par_temp_loc+'metadata_file_imedical.txt', delimiter='|',error_bad_lines=False)
uni_master_job = df['Master Job Name'].unique().tolist()
print(uni_master_job)
'''getting unique execution sequence'''
logging.info("Getting the unique Execution Sequence Number!")
unique_exec_seq = df['Execution Sequence'].unique().tolist()
unique_exec_seq.sort()
print(unique_exec_seq)
num_unique_seq = len(unique_exec_seq)
logging.info("Total Number of unique sequence Number : %2d" %(num_unique_seq))
p2 = ThreadWithReturnValue(target = partial(parallel_temp2, unique_exec_seq, df ))
p2.start()
r2 = p2.join()
print(r2)
#r1 = r1.append(r2)
mail_df(r2)
'''Parallel Processing Function'''
def parallel_temp2(unique_exec_seq, df):
list_df = []
df_main4 = pd.DataFrame()
for exec_seq in unique_exec_seq:
seq_num = exec_seq
temp_df = df[df['Execution Sequence'] == exec_seq].copy()
unique_master_job = temp_df['Master Job Name'].unique().tolist()
print(unique_master_job)
#logging.info("%s Master Job Started." %(unique_master_job))
if(len(unique_master_job)>0):
num_processes = len(unique_master_job)
pool = ThreadPool(processes=num_processes)
result1 = pool.map(partial(parallel_view_creation, exec_seq, temp_df), unique_master_job)
pool.close()
pool.join()
df_main = pd.DataFrame(result1)
#print("printing df_main")
#print(df_main)
for m_job in df_main.master_job.unique():
temp_df1 = df_main[df_main['master_job'] == m_job]
status = temp_df1.status.unique()[0]
if(status == 0):
unique_master_job.remove(m_job)
pool = ThreadPool(processes=num_processes)
result2 = pool.map(partial(parallel_build_query, exec_seq, temp_df), unique_master_job)
pool.close()
pool.join()
if(result2):
df_main2 = pd.DataFrame(result2)
df_main3 = pd.concat([df_main,df_main2], sort = False)
status_df_list = df_main3['status'].unique().tolist()
print(status_df_list)
if(0 in status_df_list):
break
if(0 in status_df_list):
break
else:
df_main4 = df_main4.append(df_main3)
if(0 in status_df_list):
df_main4 = df_main4.append(df_main3)
return df_main4
代码说明:
首先,我正在读取元数据文件,其中包含有关job和seq_num的所有信息。 然后我要执行唯一的作业和唯一的seq_num。 传递给ThreadWithReturnValue函数。
在基于seq_num的Parallel_temp2函数中,我正在触发作业。
预先感谢!
答案 0 :(得分:0)
一种可能的方式是:
finished
事件最小代码:
import time
from random import randint
from threading import Thread, Event
class Job(Thread):
def __init__(self, name, target, args=(), deps=None):
super().__init__(name=name)
self.result = None
self.deps = deps
self.target = target
self.args = args
self.finished = Event()
def run(self):
print(f"{self.name} waiting")
# Wait for all dependent jobs to finish
if self.deps:
for dep in self.deps:
dep.finished.wait()
print(f"{self.name} running!")
# Now we can start
self.result = self.target(self.deps, *self.args)
# Flag this job as done
self.finished.set()
print(f"{self.name} done")
def sleep_random_time(deps, *args):
"""
Sleep random time and return the time slept
Args:
deps: a list of (direct) dependencies
"""
if deps:
for dep in deps:
print(f" Result of dependency {dep.name}={dep.result}")
t = randint(2,20)
time.sleep(t)
return t
def main():
# Create our jobs. I am almost certain this can be done smarter with some
# sort of tree but it highly depends on how/where you get your jobs
jobs = []
jobs.append(Job("A", sleep_random_time))
jobs.append(Job("B", sleep_random_time))
jobs.append(Job("C", sleep_random_time, deps=[jobs[0]]))
jobs.append(Job("D", sleep_random_time, deps=[jobs[1]]))
jobs.append(Job("E", sleep_random_time, deps=[jobs[1], jobs[2]]))
for j in jobs:
j.start()
for j in jobs:
j.join()
if __name__ == "__main__":
main()
输出:
$ python3 ~/tmp/test.py
A waiting
A running!
B waiting
C waiting
D waiting
E waiting
B running!
A done
C running!
Result of dependency A=18
B done
D running!
Result of dependency B=18
D done
C done
E running!
Result of dependency B=18
Result of dependency C=20
E done
注意:
对Job定义进行了一些修改:
import time
from random import randint
from threading import Thread, Event
from concurrent.futures import ThreadPoolExecutor
class Job:
def __init__(self, name, target, args=(), deps=None):
self.result = None
self.name = name
self.deps = deps
self.target = target
self.args = args
self.finished = Event()
def can_run(self):
if not self.deps:
return True
for dep in self.deps:
if not dep.finished.is_set():
return False
return True
def run(self):
print(f"{self.name} running!")
# Now we can start
self.result = self.target(self.deps, *self.args)
# Flag this job as done
self.finished.set()
print(f"{self.name} done")
def sleep_random_time(deps, *args):
"""
Sleep random time and return the time slept
Args:
deps: a list of (direct) dependencies
"""
if deps:
for dep in deps:
print(f" Result of dependency {dep.name}={dep.result}")
t = randint(2,20)
time.sleep(t)
return t
def main():
# Create our jobs. I am almost certain this can be done smarter with some
# sort of tree but it highly depends on how/where you get your jobs
jobs = []
jobs.append(Job("A", sleep_random_time))
jobs.append(Job("B", sleep_random_time))
jobs.append(Job("C", sleep_random_time, deps=[jobs[0]]))
jobs.append(Job("D", sleep_random_time, deps=[jobs[1]]))
jobs.append(Job("E", sleep_random_time, deps=[jobs[1], jobs[2]]))
with ThreadPoolExecutor(max_workers=2) as pool:
while jobs:
for j in jobs:
if j.can_run():
pool.submit(j.run)
jobs.remove(j)
# break this for loop since we changed the list we are
# iterating
break
time.sleep(0.2)
if __name__ == "__main__":
main()
输出:
$ python3 ~/tmp/test.py
A running!
B running!
A done
C running!
Result of dependency A=8
C done
B done
D running!
Result of dependency B=12
E running!
Result of dependency B=12
Result of dependency C=4
D done
E done