hwo在snakemake中使用plink定义两个规则

时间:2019-04-09 01:51:36

标签: python snakemake

我将在一个简单示例中演示,我有两个名为a.map和a.ped的文件,它们均为plink格式。我想使用两个命令,第一个转换为bfile格式,第二个转换为原始格式。

我的文件:a.map,a.ped:

> $ cat a.map

1 snp1 0 1
1 snp2 0 2
1 snp3 0 3


> $ cat a.ped 

1 1 0 0 1  0  1 1  2 2  1 1
1 2 0 0 2  0  2 2  0 0  2 1
1 3 1 2 1  2  0 0  1 2  2 1
2 1 0 0 1  0  1 1  2 2  0 0
2 2 0 0 2  2  2 2  2 2  0 0
2 3 1 2 1  2  1 1  2 2  1 1

第一个命令:

plink --file a --out b

我得到了四个文件:b.bed b.bam b.fam b.log

(base) [dengfei@localhost plink-test]$ ls b*
b.bed  b.bim  b.fam  b.log

第二个命令:

 plink --bfile b --out c --recodeA

我得到两个文件:

c.log  c.raw

这是我的问题: 在第一个命令中,plink使用--out来生成b.bim, b.bed,b.fam,但是我不能在snakemake的第二个命令中使用该名称。

我的拳头Snakefile:

rule bfile:
params:
    a1 = "a",
    a2 = "b"
shell:"plink --file {params.a1} --out {params.a2}"

运行良好。

    (base) [dengfei@localhost plink-test]$ snakemake -s test1.py 
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   bfile
    1

rule bfile:
    jobid: 0

PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to b.log.
Options in effect:
  --file a
  --out b

63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
--file: b.bed + b.bim + b.fam written.
Finished job 0.
1 of 1 steps (100%) done

当我在snakemake中添加另一个规则以运行第二个命令时,出现了错误,我的Snakefile:

rule all:
    input:
        "c.log","c.raw"
rule bfile:
    params:
        a1 = "a",
        a2 = "b"
    shell:"plink --file {params.a1} --out {params.a2}"
rule cfile:
    params:
        aa1 = "b",
    aa2 = "c"
    shell:"plink --bfile {params.aa1} --out {params.aa2} --recodeA"

它显示c.log和c.raw缺少输入

MissingInputException in line 1 of /home/dengfei/test/snakemake/plink-test/test1.py:
Missing input files for rule all:
c.log
c.raw

我不知道如何连接这两个规则。任何建议都会很棒!非常感谢。

2 个答案:

答案 0 :(得分:0)

Snakemake使用inputoutput文件来识别工作流程中的依赖关系,它们在您的规则中丢失。为规则inputoutput定义bfilecfile文件,然后在rule all中定义工作流的最终文件(或预期的外文件)。

rule all:
    input:
        "c.log","c.raw"

rule bfile:
    input:
        "input files of rule bfile here"
    output:
        "output files of rule bfile here"
    params:
        a1 = "a",
        a2 = "b"
    shell:
        "plink --file {params.a1} --out {params.a2}"

rule cfile:
    input:
        "rule bfile outfiles"
    output:
        "c.log", "c.raw"
    params:
        aa1 = "b",
        aa2 = "c"
    shell:
        "plink --bfile {params.aa1} --out {params.aa2} --recodeA"

我建议您通过snakemake tutorial进行操作。

答案 1 :(得分:0)

在JeeYem的帮助下,正确的代码是:

rule all:
    input:
        "c.log","c.raw"

rule bfile:
    input:
        "a.map","a.ped"
    output:
        "b.bed","b.bim","b.fam"
    params:
        a1 = "a",
        a2 = "b"
    shell:
        "plink --file {params.a1} --out {params.a2}"

rule cfile:
    input:
        "b.bed","b.bim","b.fam"
    output:
        "c.log", "c.raw"
    params:
        aa1 = "b",
        aa2 = "c"
    shell:
        "plink --bfile {params.aa1} --out {params.aa2} --recodeA"

然后我运行snakemake,它会生成我想要的结果:

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1   bfile
    1   cfile
    3

rule bfile:
    input: a.map, a.ped
    output: b.bed, b.bim, b.fam
    jobid: 2

PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to b.log.
Options in effect:
  --file a
  --out b

63985 MB RAM detected; reserving 31992 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 6 people).
--file: b.bed + b.bim + b.fam written.
Finished job 2.
1 of 3 steps (33%) done

rule cfile:
    input: b.bed, b.bim, b.fam
    output: c.log, c.raw
    jobid: 1

PLINK v1.90b6.5 64-bit (13 Sep 2018)           www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Note: --recodeA flag deprecated.  Use 'recode A ...'.
Logging to c.log.
Options in effect:
  --bfile b
  --out c
  --recode A

63985 MB RAM detected; reserving 31992 MB for main workspace.
3 variants loaded from .bim file.
6 people (4 males, 2 females) loaded from .fam.
3 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 4 founders and 2 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.777778.
3 variants and 6 people pass filters and QC.
Among remaining phenotypes, 3 are cases and 0 are controls.  (3 phenotypes are
missing.)
--recode A to c.raw ... done.
Finished job 1.
2 of 3 steps (67%) done

localrule all:
    input: c.log, c.raw
    jobid: 0

Finished job 0.
3 of 3 steps (100%) done