我有3个不同样本的三个单细胞bam文件,需要按簇将其拆分为较小的bam。然后,我需要合并来自相同样本的不同样本的bam文件。我尝试使用检查点,但有点失落。 https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
这是我发布的split bam files to (variable) pre-defined number of small bam files depending on the sample
这个问题的延续SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" : [ "1" ], "SampleC" : [ "1", "2" ] }
CLUSTERS = []
for sample in SAMPLE_cluster:
CLUSTERS.extend(SAMPLE_cluster[sample])
CLUSTERS = sorted(set(CLUSTERS)
rule all:
input: expand("01merged_bam/{cluster_id}.bam, cluster_id = CLUSTERS)
checkpoint split_bam:
input: "{sample}.bam"
output: directory("01split_bam/{sample}/")
shell:
"""
split_bam.sh {input}
"""
## the split_bam.sh will split the bam file to "01split_bam/{sample}/{sample}_{cluster_id}.bam"
def merge_bam_input(wildcards):
checkpoint_output = checkpoints.split_bam.get(**wildcards).output[0]
return expand("01split_bam/{sample}/{sample}_{{cluster_id}}.bam", \
sample = glob_wildcards(os.path.join(checkpoint_output, "{sample}_{cluster_id}.bam")).sample)
rule merge_bam_per_cluster:
input: merge_bam_input
output: "01merged_bam/{cluster_id}.bam"
log: "00log/{cluster_id}.merge_bam.log"
threads: 2
shell:
"""
samtools merge -@ 2 -r {output} {input}
"""
根据集群号,规则merge_bam_per_cluster的输入将更改:
例如对于群集1:“ 01split_bam / SampleA / SampleA_1.bam”,“ 01split_bam / SampleB / SampleB_1.bam”,“ 01split_bam / SampleC / SampleC_1.bam”。
对于群集2:“ 01split_bam / SampleA / SampleA_2.bam”,“ 01split_bam / SampleC / SampleC_2.bam”。
对于群集3:“ 01split_bam / SampleA / SampleA_3.bam”。
答案 0 :(得分:1)
我决定不使用检查点,而是使用输入函数获取
的输入
SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" : [ "1" ], "SampleC" : [ "1", "2" ] }
# reverse the mapping
cluster_sample = {'1':['sampleA','sample'B','sampleC'], '2':['sampleA', 'sampleC'], '3':['sampleA']}
rule split_bam:
input: "{sample}.bam"
output: "split.touch"
shell:
"""
split_bam {input}
touch split.touch
"""
rule index_split_bam:
input: "split.touch"
output: "split_bam/{sample}_{cluster_id}.bam.bai"
shell:
"""
samtools index 01split_bam/{wildcards.sample}/{wildcards.sample}_{wildcards.cluster_id}.bam
"""
def get_merge_bam_input(wildcards):
samples = cluster_sample[wildcards.cluster_id]
return expand("01split_bam/{sample}/{sample}_{{cluster_id}}.bam.bai", sample = samples)
rule merge_bam_per_cluster:
input: get_merge_bam_input
output: "01merged_bam/{cluster_id}.bam"
params:
bam = lambda wildcards, input: " ".join(input).replace(".bai", "")
log: "00log/{cluster_id}.merge_bam.log"
threads: 2
shell:
"""
samtools merge -@ 2 -r {output} {params.bam}
"""
它似乎正在工作。