如何编写仅在可能的多个子实例的第一个实例中运行的snakefile代码

时间:2018-10-03 18:38:01

标签: multiple-instances snakemake

我希望能够在我的Snakefile中编写仅在首次调用Snakefile时执行的代码,并且如果snakemake将Snakefile作为子实例重新运行,则不会执行该代码,因为我将-j选项指定为使用多个核心。我该怎么办?

我不是在谈论工作流程代码,而是蛇文件中的python代码,它执行与准备陈述工作流程规则有关的各种任务。

我有几个地方要执行此操作,一些是因为不需要多次执行,并且我想通过仅在首次初始调用时执行蛇文件来加快它的速度。例如,我的蛇文件代码的一部分检查用户是否编辑了某些管道包含文件(实际管道的未输入和输出文件),如果是,则备份它们,而我不希望每个子实例扫描所有这些文件的日期,并在必要时进行备份。实际上,存在争用条件,多个实例尝试备份同一文件。

1 个答案:

答案 0 :(得分:0)

我找到了一种方法。

# Create Boolean variable isFirstInstance, True if this is the first snakemake
# instance of a run of snakemake, False if it is nested sub-instance.
#
# This determines whether or not this is the first snakemake instance by creating
# a unique file with each initial run of the snakefile, whose name is created
# much as tempFile() creates files, but we don't use tempFile() because we don't
# want to delete this file when any instance exits, only when the first instance
# exits.  The file name includes the process group ID, which will be the same
# for the first instance and for sub-instances.  The file contains one line, the
# process ID of its creator.  If the file doesn't exist, it is created and we
# set the variable isFirstInstance True to indicate that this is the first
# instance of the pipeline.  If the file exists and the process ID it contains
# matches the process ID of one of the parents of the current process, then the
# current process is not the first instance of this pipeline invocation, and
# so we set isFirstInstance False.  Two other aberrant situations can arise.
# First, if the file exists and its contained process ID matches the process ID
# of THIS process, we presume that the file was for some reason not deleted from
# a previous run, and that run happened to have a process group ID and process
# ID matching the current one, and so we assume we are first instance, and we
# delete the file and recreate it so its date matches the current date.  Second,
# if the file exists and DOES NOT contain the process IDs of one of our parents,
# we make the same presumption of undeleted old file, and again delete the file,
# then rewrite it with our process ID.
################################################################################

# Create file name containing our process group ID in the name.
initialInstancePIDfile = TMP_DIR + "/initialInstancePID." + str(os.getpgrp()) + ".tmp"

# If file doesn't exist, this is first instance.  Create the file.
myPID = str(os.getpid())
if not os.path.exists(initialInstancePIDfile):
    f = open(initialInstancePIDfile, "wt")
    f.write(myPID)
    f.close()
    isFirstInstance = True
    #print("Instance file does not exist, created it:", initialInstancePIDfile, "and myPID =", myPID)
else:
    # Otherwise, read the process ID from the file and see if it matches ours.
    f = open(initialInstancePIDfile, "rt")
    fPID = f.readlines(1)[0]
    f.close()
    if fPID == myPID:
        f = open(initialInstancePIDfile, "wt")
        f.write(myPID)
        f.close()
        isFirstInstance = True
        print("Instance file existed already, with our PID: ", myPID, " so we presumed it was a leftover and deleted and recreated it.")
    else:
        isFirstInstance = None
        # It doesn't match ours, does it match one of our parents?
        try:
            lastPID = None
            parentPID = myPID
            while parentPID != lastPID:
                lastPID = parentPID
                parentPID = str(psutil.Process(int(lastPID)).ppid())
                #print("Parent ID is:", parentPID)
                if parentPID == fPID:
                    isFirstInstance = False
                    #print("Instance file contains the PID of one of our parents:", fPID, initialInstancePIDfile, "and myPID =", myPID)
                    break
        except:
            pass
        # If it doesn't match a parent either, it is a leftover file from a
        # previous invocation.  Replace it with a new file.
        if isFirstInstance is None:
            f = open(initialInstancePIDfile, "wt")
            f.write(myPID)
            f.close()
            isFirstInstance = True
            print("Instance file existed already, with a PID:", fPID, "not matching ours:", myPID,
                "or a parent, so we presumed it was a leftover and deleted and recreated it.")
if isFirstInstance:
    print("Initial pipeline instance running.")
else:
    print("Pipeline sub-instance running.")