使用snakemake双端bwa对齐

时间:2019-05-23 08:33:28

标签: python bioinformatics snakemake snakeyaml

我是使用蛇形的新手,在蛇形中进行映射时遇到一个简单的问题。我有几个_1.fastq.gz和_2.fastq.gz,我想对大约10对fastq.gz进行成对映射。因此,我为此编写了一个snakemake文件。

public class Selenium {

    private ChromeDriver chromeDriver;
    WebDriverWait wait;
    private Robot robot;

    public Selenium() throws AWTException {

    }

    public void startChrome() throws InterruptedException, AWTException {

        robot = new Robot();
        String pathToChromeDriver = "resources/driver/chromedriver.exe";
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--start-maximized");
        System.setProperty("webdriver.chrome.driver", pathToChromeDriver);
        chromeDriver = new ChromeDriver(options);
        chromeDriver.manage().timeouts().implicitlyWait(15, TimeUnit.SECONDS);
        wait = new WebDriverWait(chromeDriver, 15);

        chromeDriver.get("https://www.binance.com/de/trade/pro/XRP_BTC");
        Thread.sleep(200);
        chromeDriver.findElement(By.className("DEMrI")).click();
        Thread.sleep(200);
        chromeDriver.findElements(By.className("csajsa")).get(0).click();
        chromeDriver.findElements(By.className("bEXbyP")).get(6).click();
        chromeDriver.findElements(By.className("bEXbyP")).get(5).click();

        setStochRSI();

        chromeDriver.findElements(By.className("mt9q6r-1")).get(1).click();

    }

    private void setStochRSI() throws AWTException, InterruptedException {

        Thread.sleep(3000);
        robot.mouseMove(650, 250);
        mouseClick();
        robot.keyPress(KeyEvent.VK_S);
        robot.keyRelease(KeyEvent.VK_S);
        robot.keyPress(KeyEvent.VK_T);
        robot.keyRelease(KeyEvent.VK_T);
        Thread.sleep(500);
        robot.mouseMove(500, 310);
        mouseClick();
        robot.mouseMove(877, 205);
        mouseClick();
        robot.mouseMove(1250, 650);
        Thread.sleep(100);
        robot.mouseMove(0, 0);

    }

    public Double getValueOfCoin() {

        return Double.parseDouble(chromeDriver.findElement(By.className("sc-1yysggs-0")).getText().substring(12));

    }

    public Double getBlueStRSI() {

        String value;
        value = wait
                .until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.className("pane-legend-item-value-wrap")))
                .get(9).getText();

        return Double.parseDouble(value);

    }

    public Double getRedStRSI() {

        String value;
        value = wait
                .until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.className("pane-legend-item-value-wrap")))
                .get(19).getText();

        return Double.parseDouble(value);

    }

    public ChromeDriver getChromeDriver() {

        return chromeDriver;

    }

    private void mouseClick() throws InterruptedException {
        robot.mousePress(MouseEvent.BUTTON1_MASK);
        Thread.sleep(50);
        robot.mouseRelease(MouseEvent.BUTTON1_MASK);
    }

}

错误:

import os
import snakemake.io
import glob

(SAMPLES,READS,) = glob_wildcards("raw/{sample}_{read}.fastq.gz")
READS=["1","2"]
REF="/data/data/reference/refs/ucsc.hg19.fasta.gz"

rule all:
    input: expand("raw/{sample}.bam",sample=SAMPLES)

rule bwa_map:
    input:
        ref=REF,
        r1=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS),
        r2=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS)

    output: "raw/{sample}.bam"

    shell: "bwa mem -M -t 8 {input.ref} {input.r1} {input.r2} | samtools view -Sbh - > {output}"

我想要的输出就像生成所有10个bam文件一样

sub1.bam sub2.bam sub3.bam ...

似乎将所有fastq文件放入命令中。如何不使用硬编码方法将它们分开并自动成对运行。请指教。

1 个答案:

答案 0 :(得分:2)

第一个规则(此处为rule all)指定您要在snakemake工作流程中创建的文件。

对于f中的给定文件rule all::input,snakemake将仔细检查所有规则,并尝试找到可以创建f的规则(基于{{每个规则的1}}段。

假设output

一旦f = raw/my_sample.bam找到了可以创建snakemake的规则,它将确定制作该文件所需的所有输入文件。

所以在这里,snakemake发现f可以由f = raw/my_sample.bam创建(因为rule bwa_map与模式f匹配),然后确定制作{{ 1}}(基于raw/<anything>.bam段)。

Snakemake认为:如果我有,我可以制造f 文件input 文件raw/my_sample.bam 和文件ref="/data/data/reference/refs/ucsc.hg19.fasta.gz"

r1=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS)中,r2=expand("raw/{sample}_{read}.fastq.gz",sample=SAMPLES,read=READS)expand扩展为SAMPLES中的每个值,并将r1扩展为READS中的每个值。但是您在SAMPLES中有10个值,在READS中有2个值,因此sample会为它尝试制作的每个输出文件扩展到20个不同的文件路径。它忽略了read子句中存在的r1通配符(因为您已经在sample调用中覆盖了通配符)。

您必须让output子句中定义的通配符冒泡到input子句

output

我恳请您看看如何在snakemake包装器资源中编写expand对齐规则(您也可以考虑使用包装器):https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html

离题:从代码审查的角度,我质疑为什么要将对齐的数据输出到import os import snakemake.io import glob (SAMPLES,READS,) = glob_wildcards("raw/{sample}_{read}.fastq.gz") READS=["1","2"] REF="/data/data/reference/refs/ucsc.hg19.fasta.gz" rule all: input: expand("raw/{sample}.bam",sample=SAMPLES) rule bwa_map: input: ref=REF, # determine `r1` based on the {sample} wildcard defined in `output` # and the fixed value `1` to indicate the read direction r1="raw/{sample}_1.fastq.gz", # determine `r2` based on the {sample} wildcard similarly r2="raw/{sample}_2.fastq.gz" output: "raw/{sample}.bam" # better to pass in the threads than to hardcode them in the shell command threads: 8 shell: "bwa mem -M -t {threads} {input.ref} {input.r1} {input.r2} | samtools view -Sbh - > {output}" 目录?将对齐的数据输出到bwaraw会更有意义吗?您似乎还从不使用的包中导入。