为清楚起见,我必须修改帖子。情况是,一开始我在本地计算机上很好地运行了管道,但是在提交给集群时却失败了。发布问题后,我发现snakemake的版本是3.13.3,所以我更新到v5.7.3,然后我发现它在本地计算机和群集上均失败。因此,我现在正在努力弄清我的Snakefile
或其他任何问题。
错误消息:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 24 of /work/path/rna_seq_pipeline/Snakefile:
Missing files after 5 seconds:
bam/A2_Aligned.toTranscriptome.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work/path/rna_seq_pipeline/.snakemake/log/2019-11-07T153434.327966.snakemake.log
所以我的snakamake文件可能有问题。这是我的Snakefile
:
# config file
configfile: "config.yaml"
shell.prefix("source ~/.bash_profile")
# determine which genome reference you would like to use
# here we are using GRCm38
# depending on the freeze, the appropriate references and data files will be chosen from the config
freeze = config['freeze']
# read list of samples, one per line
with open(config['samples']) as f:
SAMPLES = f.read().splitlines()
rule all:
input:
starindex = config['reference']['stargenomedir'][freeze] + "/" + "SAindex",
rsemindex = config['reference']['rsemgenomedir'][freeze] + ".n2g.idx.fa",
fastqs = expand("fastq/{file}_{rep}_paired.fq.gz", file = SAMPLES, rep = ['1','2']),
bams = expand("bam/{file}_Aligned.toTranscriptome.out.bam", file = SAMPLES),
quant = expand("quant/{file}.genes.results", file = SAMPLES)
# align using STAR
rule star_align:
input:
f1 = "fastq/" + "{file}_1_paired.fq.gz",
f2 = "fastq/" + "{file}_2_paired.fq.gz"
output:
out = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
params:
star = config['tools']['star'],
genomedir = config['reference']['stargenomedir'][freeze],
prefix = "bam/" + "{file}_"
threads: 12
shell:
"""
{params.star} \
--runThreadN {threads} \
--genomeDir {params.genomedir} \
--readFilesIn {input.f1} {input.f2} \
--readFilesCommand zcat \
--outFileNamePrefix {params.prefix} \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--quantMode TranscriptomeSAM \
--outSAMattributes NH HI AS NM MD \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--sjdbScore 1 \
--limitBAMsortRAM 50000000000
"""
# quantify expression using RSEM
rule rsem_quant:
input:
bam = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
output:
quant = "quant/" + "{file}.genes.results"
params:
calcexp = config['tools']['rsem']['calcexp'],
genomedir = config['reference']['rsemgenomedir'][freeze],
prefix = "quant/" + "{file}"
threads: 12
shell:
"""
{params.calcexp} \
--paired-end \
--no-bam-output \
--quiet \
--no-qualities \
-p {threads} \
--forward-prob 0.5 \
--seed-length 21 \
--fragment-length-mean -1.0 \
--bam {input.bam} {params.genomedir} {params.prefix}
还有我的config.yaml
:
freeze: grcm38
# samples file
samples:
samples.txt
# software, binaries or tools
tools:
fastqdump: fastq-dump
star: STAR
rsem:
calcexp: rsem-calculate-expression
prepref: rsem-prepare-reference
# reference files, genome indices and data
reference:
stargenomedir:
grch38: /work/path/reference/STAR/GRCh38
grcm38: /work/path/reference/STAR/GRCm38
rsemgenomedir:
grch38: /work/path/reference/RSEM/GRCh38/GRCh38
grcm38: /work/path/reference/RSEM/GRCm38/GRCm38
fasta:
grch38: /work/path/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.dna.primary_assembly.fa
gtf:
grch38: /work/path/reference/GRCh38/Homo_sapiens.GRCh38.98.gtf
grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.98.gtf
最后,samples.txt
:
A1
A2
有什么建议吗?
ps:改编自管道https://github.com/komalsrathi/rnaseq-star-rsem-pipeline/blob/master/Snakefile
答案 0 :(得分:0)
触摸碰到cannot touch: : No such file or directory
时,通常意味着目录结构不存在。如果您尝试怎么办:
touch /work/path/rna_seq_pipeline/.snakemake/tmp.o_2ffebs/1.jobfailed
缺少文件行表明您正在尝试将输出存储在.snakemake文件夹中。真的吗?如果将其移至例如当前工作目录,会发生什么?
答案 1 :(得分:0)
有关Biostars的同一帖子,并在此处回答。 https://www.biostars.org/p/406693/#406907