Question

我有一个snakemake工作流程，其中一些规则具有复杂的功能作为输入：

def source_fold_data(wildcards):
    fold_type = wildcards.fold_type
    if fold_type in {"log2FoldChange", "lfcMLE"}:
        if hasattr(wildcards, "contrast_type"):
            # OPJ is os.path.join
            return expand(
                OPJ(output_dir, aligner, "mapped_C_elegans",
                    "deseq2_%s" % size_selected, "{contrast}",
                    "{contrast}_{{small_type}}_counts_and_res.txt"),
                contrast=contrasts_dict[wildcards.contrast_type])
        else:
            return rules.small_RNA_differential_expression.output.counts_and_res
    elif fold_type == "mean_log2_RPKM_fold":
        if hasattr(wildcards, "contrast_type"):
            # This is the branch used when I have the AttributeError
            #https://stackoverflow.com/a/26791923/1878788
            return [filename.format(wildcards) for filename in expand(
                OPJ(output_dir, aligner, "mapped_C_elegans",
                    "RPKM_folds_%s" % size_selected, "{contrast}",
                    "{contrast}_{{0.small_type}}_RPKM_folds.txt"),
                contrast=contrasts_dict[wildcards.contrast_type])]
        else:
            return rules.compute_RPKM_folds.output.fold_results
    else:
        raise NotImplementedError("Unknown fold type: %s" % fold_type)

以上功能用作两条规则的输入：

rule make_gene_list_lfc_boxplots:
    input:
        data = source_fold_data,
    output:
        boxplots = OPJ(output_dir, "figures", "{contrast}",
            "{contrast}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}")
    params:
        id_lists = set_id_lists,
    run:
        data = pd.read_table(input.data, index_col="gene")
        lfcs = pd.DataFrame(
            {list_name : data.loc[set(id_list)][wildcards.fold_type] for (
                list_name, id_list) in params.id_lists.items()})
        save_plot(output.boxplots, plot_boxplots, lfcs, wildcards.fold_type)


rule make_contrast_lfc_boxplots:
    input:
        data = source_fold_data,
    output:
        boxplots = OPJ(output_dir, "figures", "all_{contrast_type}",
            "{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}")
    params:
        id_lists = set_id_lists,
    run:
        lfcs = pd.DataFrame(
            {f"{contrast}_{list_name}" : pd.read_table(filename, index_col="gene").loc[
                set(id_list)]["mean_log2_RPKM_fold"] for (
                    contrast, filename) in zip(contrasts_dict["ip"], input.data) for (
                        list_name, id_list) in params.id_lists.items()})
        save_plot(output.boxplots, plot_boxplots, lfcs, wildcards.fold_type)

第二个失败了'InputFiles' object has no attribute 'data'，仅在某些情况下：我使用两个不同的配置文件运行相同的工作流，并且错误只发生在两个中的一个中，尽管在两种情况下都执行了此规则，并且输入了相同的输入函数分支。

如果规则有：

，怎么会发生这种情况

    input:
        data = ...

我想这与我的source_fold_data返回的内容有关，要么是另一个规则的显式输出，要么是“手动”构建的文件名列表。

Answer 1

在注释中建议@Colin，输入函数返回空列表时会出现问题。当contrasts_dict[wildcards.contrast_type]是一个空列表时，就是这种情况，这个条件表明在尝试生成规则make_contrast_lfc_boxplots的输出时实际上没有意义。我通过修改规则all的输入部分来避免这种情况，如下所示：

旧版本：

rule all:
    input:
        # [...]
        expand(OPJ(output_dir, "figures", "all_{contrast_type}", "{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}"), contrast_type=["ip"], small_type=IP_TYPES, fold_type=["mean_log2_RPKM_fold"], gene_list=BOXPLOT_GENE_LISTS, fig_format=FIG_FORMATS),
        # [...]

新版本：

if contrasts_dict["ip"]:
    ip_fold_boxplots = expand(OPJ(output_dir, "figures", "all_{contrast_type}", "{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}"), contrast_type=["ip"], small_type=IP_TYPES, fold_type=["mean_log2_RPKM_fold"], gene_list=BOXPLOT_GENE_LISTS, fig_format=FIG_FORMATS)
else:
    ip_fold_boxplots = []
rule all:
    input:
        # [...]
        ip_fold_boxplots,
        # [...]

对snakemake/rules.py进行一些修改表明，在某些时候，data属性存在input属性Rule对象，名为make_contrast_lfc_boxplots，并且该属性仍然是source_fold_data函数。我想这是后来评估并删除它是一个空列表，但我还没有找到位置。

我认为当snakemake在规则之间构造依赖图时，空输入不是问题。因此，问题只发生在执行规则期间。

当使用函数作为snakemake规则的输入时，'InputFiles'对象没有属性<x>

1 个答案: