我需要在snakemake中使用嵌套的检查点,因为对于每个动态文件,我都必须再次创建其他动态文件。到目前为止,我无法正确解决这两个检查点。在下面,您可以找到一个最小的玩具示例。
似乎直到第一个签出未得到正确解决,第二个检查点才被执行,因此单个聚合规则将无法工作。 我不知道如何调用两个检查点并解决通配符。
import os.path
import glob
rule all:
input:
'collect/all_done.txt'
#generate a number of files
checkpoint create_files:
output:
directory('files')
run:
import random
r = random.randint(1,10)
for x in range(r):
output_dir = output[0] + '/' + str(x+1)
import os
if not os.path.isdir(output_dir):
os.makedirs(output_dir, exist_ok=True)
output_file=output_dir + '/test.txt'
print(output_file)
with open(output_file, 'w') as f:
f.write(str(x+1))
checkpoint create_other_files:
input: 'files/{i}/test.txt'
output: directory('other_files/{i}/')
shell:
'''
L=$(( $RANDOM % 10))
for j in $(seq 1 $L);
do
mkdir -p {output}/{j}
cp -f {input} {output}/$j/test2.txt
done
'''
def aggregate(wildcards):
i_wildcard = checkpoints.create_files.get(**wildcards).output[0]
print('in_def_aggregate')
print(i_wildcard)
j_wildcard = checkpoints.create_other_files.get(**wildcards).output[0]
print(j_wildcard)
split_files = expand('other_files/{i}/{j}/test2.txt',
i =glob_wildcards(os.path.join(i_wildcard, '{i}/test.txt')).i,
j = glob_wildcards(os.path.join(j_wildcard, '{j}/test2.txt')).j
)
return split_files
#non-sense collect function
rule collect:
input: aggregate
output: touch('collect/all_done.txt')
当前,我从snakemake得到以下错误:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 collect
1 create_files
3
[Thu Nov 14 14:45:01 2019]
checkpoint create_files:
output: files
jobid: 2
Downstream jobs will be updated after completion.
Job counts:
count jobs
1 create_files
1
files/1/test.txt
files/2/test.txt
files/3/test.txt
files/4/test.txt
files/5/test.txt
files/6/test.txt
files/7/test.txt
files/8/test.txt
files/9/test.txt
files/10/test.txt
Updating job 1.
in_def_aggregate
files
[Thu Nov 14 14:45:02 2019]
Error in rule create_files:
jobid: 2
output: files
InputFunctionException in line 53 of /TL/stat_learn/work/feldmann/Phd/Projects/HIVImmunoAdapt/HIVIA/playground/Snakefile2:
WorkflowError: Missing wildcard values for i
Wildcards:
Removing output files of failed job create_files since they might be corrupted:
files
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
我对拥有/ other_files / {checkpoint_1_wildcard} / {checkpoint_2_wildcard} /test2.txt
文件感兴趣答案 0 :(得分:1)
我不确定您要做什么,所以我重写了一些内容。但是是否可以解决问题?
import glob
import random
from pathlib import Path
rule all:
input:
'collect/all_done.txt'
checkpoint first:
output:
directory('first')
run:
for i in range(random.randint(1,10)):
Path(f"{output[0]}/{i}").mkdir(parents=True, exist_ok=True)
Path(f"{output[0]}/{i}/test.txt").touch()
checkpoint second:
input:
'first/{i}/test.txt'
output:
directory('second/{i}')
run:
for j in range(random.randint(1,10)):
Path(f"{output[0]}/{j}").mkdir(parents=True, exist_ok=True)
Path(f"{output[0]}/{j}/test2.txt").touch()
rule copy:
input:
'second/{i}/{j}/test2.txt'
output:
'copy/{i}/{j}/test2.txt'
shell:
"""
cp -f {input} {output}
"""
def aggregate(wildcards):
outputs_i = glob.glob(f"{checkpoints.first.get().output}/*/")
outputs_i = [output.split('/')[-2] for output in outputs_i]
split_files = []
for i in outputs_i:
outputs_j = glob.glob(f"{checkpoints.second.get(i=i).output}/*/")
outputs_j = [output.split('/')[-2] for output in outputs_j]
for j in outputs_j:
split_files.append(f"copy/{i}/{j}/test2.txt")
return split_files
rule collect:
input:
aggregate
output:
touch('collect/all_done.txt')