在Snakemake conda环境中安装非conda依赖项的最佳方法

时间:2020-10-31 01:47:38

标签: python conda snakemake

我希望能够在Snakemake创建的R conda环境中从GitHub安装R软件包,以及在python环境中通过pip安装python库。此后,我将在整套规则中使用这些环境。

我最初的想法是创建一个运行脚本的规则来安装指定的软件包。

例如,我最初的跑步是:snakemake -j1 --use-conda -R create_r_environment

我的蛇文件

rule create_r_environment:
    conda:
        "envs/r.yaml"
    script:
        "scripts/r-dependencies.R"

rule create_python_environment:
    conda:
        "envs/python.yaml"
    script:
        "scripts/python-dependencies.py"    

我的 envs / r.yaml 文件:

channels:
 - conda-forge
dependencies:
 - r=4.0

我的 r-dependencies.R 文件:

remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

我的 envs / pyton.yaml 文件:

channels:
 - conda-forge
dependencies:
 - python=3.8.2

我的 python-dependencies.py 文件:

!pip install gseapy

日志输出

Building DAG of jobs...
Creating conda environment envs/r.yaml...
Downloading and installing remote packages.
Environment for envs/r.yaml created (location: .snakemake/conda/388,repos = "http://cran.us.r-project.org")f7df8)
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   create_r_environment
    1

[Fri Oct 30 22:38:56 2020]
rule create_r_environment:
    jobid: 0

Activating conda environment: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
[Fri Oct 30 22:38:57 2020]
Error in rule create_r_environment:
    jobid: 0
    conda-env: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8

RuleException:
CalledProcessError in line 5 of /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile:
Command 'source /home/cmcouto-silva/miniconda3/bin/activate '/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8'; set -euo pipefail;  Rscript --vanilla /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/scripts/tmpa6jdxovx.r-dependencies.R' returned non-zero exit status 1.
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
  File "/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile", line 5, in __rule_create_r_environment
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/log/2020-10-30T223743.852983.snakemake.log

我的文件夹结构:

.
├── envs
│   ├── python.yaml
│   └── r.yaml
├── scripts
│   ├── python-dependencies.py
│   └── r-dependencies.R
└── Snakefile

它成功创建了环境,但是在运行脚本时失败,我也不知道为什么。我将envs/r.yaml的文件内容更改为install.packages("data.table"),以查看github软件包是否存在问题,但不是。无论如何它都会失败。当我运行规则create_python_environment时也会发生同样的情况(输出未在此处显示)。

有帮助吗?


在接受答案后编辑

正如@dariober指出的那样,我忘记在脚本中调用remotes软件包之前先安装它。我在.yaml文件中做到了,效果很好。另外,我使用Shell而不是python文件安装了pip库。

我想强调一些要点,以防万一有人面临相同或相似的问题:

首先,我可以成功安装所需的其他软件包,但是其中一些需要特定的库(例如libcurl),该软件包已安装在我的系统中,但在Snakemake conda环境中无法识别,这迫使我要么安装它在Snakemake conda环境中(这对重现性很好,尽管我还不知道如何做到)或指定路径库。也许更好的选择是使用容器,就像@merv注释掉一样。

第二,我发现Snakemake已经提供了一种使用.yaml文件安装pip库的方法。在documentation中,它看起来像这样:

name: stats2
channels:
  - javascript
dependencies:
  - python=3.6   # or 2.7
  - bokeh=0.9.2
  - numpy=1.9.*
  - nodejs=0.10.*
  - flask
  - pip:
    - Flask-Testing

1 个答案:

答案 0 :(得分:1)

我认为有很多错误的事情:

  • remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never"):在您的r.yaml中,应包括remotes软件包。

  • !pip install gseapy是无效的python代码。如果有的话,它是由Shell执行的代码,但是我不确定前导!是否正确。此外,gseapy可从bioconda获得,我不知道为什么不应该使用pip来安装它。


在OP编辑问题之前

我的envs / r.yaml文件:

remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

奇怪的是,您正确创建了conda环境,因为r.yaml不是有效的环境文件。

这是我尝试重新创建您的问题的方法:

r.yaml

 cat r.yaml  
 remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

Snakefile:

cat Snakefile 
rule create_r_environment:
    conda:
        "r.yaml"
    script:
        "r-dependencies.R"

执行:

snakemake -j1 --use-conda -R create_r_environment

Building DAG of jobs...
Creating conda environment r.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/dario/Downloads/r.yaml:

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 80, in do_call
        exit_code = getattr(module, func_name)(args, parser)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 80, in execute
        directory=os.getcwd())
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 40, in detect
        if spec.can_handle():
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
        self._environment = env.from_file(self.filename)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 151, in from_file
        return from_yaml(yamlstr, filename=filename)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 137, in from_yaml
        data = validate_keys(data, kwargs)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 35, in validate_keys
        new_data = data.copy() if data else {}
    AttributeError: 'str' object has no attribute 'copy'

`$ /home/dario/miniconda3/bin/conda-env create --file /home/dario/Downloads/.snakemake/conda/095b0ca2.yaml --prefix /home/dario/Downloads/.snakemake/conda/095b0ca2`

  environment variables:
                 CIO_TEST=<not set>
        CMAKE_PREFIX_PATH=/home/dario/miniconda3/envs/tritume:/home/dario/miniconda3/envs/tritum
                          e/x86_64-conda-linux-gnu/sysroot/usr
  CONDA_AUTO_UPDATE_CONDA=false
      CONDA_BUILD_SYSROOT=/home/dario/miniconda3/envs/tritume/x86_64-conda-linux-gnu/sysroot
        CONDA_DEFAULT_ENV=tritume
                CONDA_EXE=/home/dario/miniconda3/bin/conda
             CONDA_PREFIX=/home/dario/miniconda3/envs/tritume
    CONDA_PROMPT_MODIFIER=(tritume)
         CONDA_PYTHON_EXE=/home/dario/miniconda3/bin/python
               CONDA_ROOT=/home/dario/miniconda3
              CONDA_SHLVL=1
            DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
           MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
                     PATH=/home/dario/miniconda3/envs/tritume/bin:/home/dario/miniconda3/condabi
                          n:/opt/gradle/gradle-5.2/bin:/home/dario/.local/share/umake/bin:/home/
                          dario/.local/bin:/home/dario/bin:/opt/gradle/gradle-5.2/bin:/usr/local
                          /sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc
                          al/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-1
                          0-oracle/db/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
               WINDOWPATH=2

     active environment : tritume
    active env location : /home/dario/miniconda3/envs/tritume
            shell level : 1
       user config file : /home/dario/.condarc
 populated config files : /home/dario/.condarc
          conda version : 4.8.3
    conda-build version : not installed
         python version : 3.7.6.final.0
       virtual packages : __glibc=2.27
       base environment : /home/dario/miniconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/dario/miniconda3/pkgs
                          /home/dario/.conda/pkgs
       envs directories : /home/dario/miniconda3/envs
                          /home/dario/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.6 Linux/4.15.0-91-generic ubuntu/18.04.4 glibc/2.27
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False


An unexpected error has occurred. Conda has prepared the above report.

If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?

[y/N]: 
Timeout reached. No report sent.


  File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/snakemake/deployment/conda.py", line 320, in create

无论如何,您的错误提示:

... r-dependencies.R' returned non-zero exit status 1

您在r-dependencies.R中拥有什么?