我将在一个简单示例中演示,我有两个名为a.map和a.ped的文件,它们均为plink格式。我想使用两个命令,第一个转换为bfile格式,第二个转换为原始格式。
我的文件:a.map,a.ped:
> $ cat a.map
1 snp1 0 1
1 snp2 0 2
1 snp3 0 3
> $ cat a.ped
1 1 0 0 1 0 1 1 2 2 1 1
1 2 0 0 2 0 2 2 0 0 2 1
1 3 1 2 1 2 0 0 1 2 2 1
2 1 0 0 1 0 1 1 2 2 0 0
2 2 0 0 2 2 2 2 2 2 0 0
2 3 1 2 1 2 1 1 2 2 1 1
第一个命令:
plink --file a --out b
我得到了四个文件:b.bed b.bam b.fam b.log
(base) [dengfei@localhost plink-test]$ ls b*
b.bed b.bim b.fam b.log
第二个命令:
plink --bfile b --out c --recodeA
我得到两个文件:
c.log c.raw
这是我的问题:
在第一个命令中,plink使用--out
来生成b.bim, b.bed,b.fam
,但是我不能在snakemake的第二个命令中使用该名称。
我的拳头Snakefile:
rule bfile:
params:
a1 = "a",
a2 = "b"
shell:"plink --file {params.a1} --out {params.a2}"
运行良好。
(base) [dengfei@localhost plink-test]$ snakemake -s test1.py
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 bfile
1
rule bfile:
jobid: 0
PLINK v1.90b6.5 64-bit (13 Sep 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to b.log.
Options in effect:
--file a
--out b
63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
--file: b.bed + b.bim + b.fam written.
Finished job 0.
1 of 1 steps (100%) done
当我在snakemake中添加另一个规则以运行第二个命令时,出现了错误,我的Snakefile:
rule all:
input:
"c.log","c.raw"
rule bfile:
params:
a1 = "a",
a2 = "b"
shell:"plink --file {params.a1} --out {params.a2}"
rule cfile:
params:
aa1 = "b",
aa2 = "c"
shell:"plink --bfile {params.aa1} --out {params.aa2} --recodeA"
它显示c.log和c.raw缺少输入
MissingInputException in line 1 of /home/dengfei/test/snakemake/plink-test/test1.py:
Missing input files for rule all:
c.log
c.raw
我不知道如何连接这两个规则。任何建议都会很棒!非常感谢。
答案 0 :(得分:0)
Snakemake使用input
和output
文件来识别工作流程中的依赖关系,它们在您的规则中丢失。为规则input
和output
定义bfile
和cfile
文件,然后在rule all
中定义工作流的最终文件(或预期的外文件)。
rule all:
input:
"c.log","c.raw"
rule bfile:
input:
"input files of rule bfile here"
output:
"output files of rule bfile here"
params:
a1 = "a",
a2 = "b"
shell:
"plink --file {params.a1} --out {params.a2}"
rule cfile:
input:
"rule bfile outfiles"
output:
"c.log", "c.raw"
params:
aa1 = "b",
aa2 = "c"
shell:
"plink --bfile {params.aa1} --out {params.aa2} --recodeA"
我建议您通过snakemake tutorial进行操作。
答案 1 :(得分:0)
在JeeYem的帮助下,正确的代码是:
rule all:
input:
"c.log","c.raw"
rule bfile:
input:
"a.map","a.ped"
output:
"b.bed","b.bim","b.fam"
params:
a1 = "a",
a2 = "b"
shell:
"plink --file {params.a1} --out {params.a2}"
rule cfile:
input:
"b.bed","b.bim","b.fam"
output:
"c.log", "c.raw"
params:
aa1 = "b",
aa2 = "c"
shell:
"plink --bfile {params.aa1} --out {params.aa2} --recodeA"
然后我运行snakemake,它会生成我想要的结果:
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 bfile
1 cfile
3
rule bfile:
input: a.map, a.ped
output: b.bed, b.bim, b.fam
jobid: 2
PLINK v1.90b6.5 64-bit (13 Sep 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to b.log.
Options in effect:
--file a
--out b
63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
--file: b.bed + b.bim + b.fam written.
Finished job 2.
1 of 3 steps (33%) done
rule cfile:
input: b.bed, b.bim, b.fam
output: c.log, c.raw
jobid: 1
PLINK v1.90b6.5 64-bit (13 Sep 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Note: --recodeA flag deprecated. Use 'recode A ...'.
Logging to c.log.
Options in effect:
--bfile b
--out c
--recode A
63985 MB RAM detected; reserving 31992 MB for main workspace.
3 variants loaded from .bim file.
6 people (4 males, 2 females) loaded from .fam.
3 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 4 founders and 2 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.777778.
3 variants and 6 people pass filters and QC.
Among remaining phenotypes, 3 are cases and 0 are controls. (3 phenotypes are
missing.)
--recode A to c.raw ... done.
Finished job 1.
2 of 3 steps (67%) done
localrule all:
input: c.log, c.raw
jobid: 0
Finished job 0.
3 of 3 steps (100%) done