我希望能够在我的Snakefile中编写仅在首次调用Snakefile时执行的代码,并且如果snakemake将Snakefile作为子实例重新运行,则不会执行该代码,因为我将-j选项指定为使用多个核心。我该怎么办?
我不是在谈论工作流程代码,而是蛇文件中的python代码,它执行与准备陈述工作流程规则有关的各种任务。
我有几个地方要执行此操作,一些是因为不需要多次执行,并且我想通过仅在首次初始调用时执行蛇文件来加快它的速度。例如,我的蛇文件代码的一部分检查用户是否编辑了某些管道包含文件(实际管道的未输入和输出文件),如果是,则备份它们,而我不希望每个子实例扫描所有这些文件的日期,并在必要时进行备份。实际上,存在争用条件,多个实例尝试备份同一文件。
答案 0 :(得分:0)
我找到了一种方法。
# Create Boolean variable isFirstInstance, True if this is the first snakemake
# instance of a run of snakemake, False if it is nested sub-instance.
#
# This determines whether or not this is the first snakemake instance by creating
# a unique file with each initial run of the snakefile, whose name is created
# much as tempFile() creates files, but we don't use tempFile() because we don't
# want to delete this file when any instance exits, only when the first instance
# exits. The file name includes the process group ID, which will be the same
# for the first instance and for sub-instances. The file contains one line, the
# process ID of its creator. If the file doesn't exist, it is created and we
# set the variable isFirstInstance True to indicate that this is the first
# instance of the pipeline. If the file exists and the process ID it contains
# matches the process ID of one of the parents of the current process, then the
# current process is not the first instance of this pipeline invocation, and
# so we set isFirstInstance False. Two other aberrant situations can arise.
# First, if the file exists and its contained process ID matches the process ID
# of THIS process, we presume that the file was for some reason not deleted from
# a previous run, and that run happened to have a process group ID and process
# ID matching the current one, and so we assume we are first instance, and we
# delete the file and recreate it so its date matches the current date. Second,
# if the file exists and DOES NOT contain the process IDs of one of our parents,
# we make the same presumption of undeleted old file, and again delete the file,
# then rewrite it with our process ID.
################################################################################
# Create file name containing our process group ID in the name.
initialInstancePIDfile = TMP_DIR + "/initialInstancePID." + str(os.getpgrp()) + ".tmp"
# If file doesn't exist, this is first instance. Create the file.
myPID = str(os.getpid())
if not os.path.exists(initialInstancePIDfile):
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
#print("Instance file does not exist, created it:", initialInstancePIDfile, "and myPID =", myPID)
else:
# Otherwise, read the process ID from the file and see if it matches ours.
f = open(initialInstancePIDfile, "rt")
fPID = f.readlines(1)[0]
f.close()
if fPID == myPID:
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
print("Instance file existed already, with our PID: ", myPID, " so we presumed it was a leftover and deleted and recreated it.")
else:
isFirstInstance = None
# It doesn't match ours, does it match one of our parents?
try:
lastPID = None
parentPID = myPID
while parentPID != lastPID:
lastPID = parentPID
parentPID = str(psutil.Process(int(lastPID)).ppid())
#print("Parent ID is:", parentPID)
if parentPID == fPID:
isFirstInstance = False
#print("Instance file contains the PID of one of our parents:", fPID, initialInstancePIDfile, "and myPID =", myPID)
break
except:
pass
# If it doesn't match a parent either, it is a leftover file from a
# previous invocation. Replace it with a new file.
if isFirstInstance is None:
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
print("Instance file existed already, with a PID:", fPID, "not matching ours:", myPID,
"or a parent, so we presumed it was a leftover and deleted and recreated it.")
if isFirstInstance:
print("Initial pipeline instance running.")
else:
print("Pipeline sub-instance running.")