HOMER de novo motif发现无法打开hg19 fasta文件

时间:2015-05-11 08:06:25

标签: perl

我有一些BAM格式的芯片seq数据 在某些时候,我想做一个de novo motif discovery 使用HOMERs findMotifsGenome.pl脚本

问题似乎是这个应用程序无法打开refrence基因组fasta文件,即使它们是由应用程序本身安装的!

有没有人遇到过这个问题?

使用的linux命令:

$ perl /home/chipseq_project/homer/bin/findMotifsGenome.pl /home/chipseq_project/homer/findpeak_output/peaks.txt hg19 / home / chipseq_project / homer / motif_output / -size given

标准输出文字:

    Position file = /home/chipseq_project/homer/findpeak_output/peaks.txt
    Genome = hg19
    Output Directory = /home/chipseq_project/homer/motif_output/
    Using actual sizes of regions (-size given)
    Fragment size set to given
    Found mset for "human", will check against vertebrates motifs
    Peak/BED file conversion summary:
            BED/Header formatted lines: 0
            peakfile formatted lines: 7662

    Peak File Statistics:
            Total Peaks: 7662
            Redundant Peak IDs: 0
            Peaks lacking information: 0 (need at least 5 columns per peak)
            Peaks with misformatted coordinates: 0 (should be integer)
            Peaks with misformatted strand: 0 (should be either +/- or 0/1)

    Peak file looks good!

    Background fragment size set to 81 (avg size of targets)
    Background files for 81 bp fragments found.

    Extracting sequences from directory: /home/chipseq_project/homer/.//data/genomes/hg19//
    !!Could not open file for 1 (.fa or .fa.masked)
    !!Could not open file for 10 (.fa or .fa.masked)
    !!Could not open file for 11 (.fa or .fa.masked)
    !!Could not open file for 12 (.fa or .fa.masked)
    !!Could not open file for 13 (.fa or .fa.masked)
    !!Could not open file for 14 (.fa or .fa.masked)
    !!Could not open file for 15 (.fa or .fa.masked)
    !!Could not open file for 16 (.fa or .fa.masked)
    !!Could not open file for 17 (.fa or .fa.masked)
    !!Could not open file for 18 (.fa or .fa.masked)
    !!Could not open file for 19 (.fa or .fa.masked)
    !!Could not open file for 2 (.fa or .fa.masked)
    !!Could not open file for 20 (.fa or .fa.masked)
    !!Could not open file for 21 (.fa or .fa.masked)
    !!Could not open file for 22 (.fa or .fa.masked)
    !!Could not open file for 3 (.fa or .fa.masked)
    !!Could not open file for 4 (.fa or .fa.masked)
    !!Could not open file for 5 (.fa or .fa.masked)
    !!Could not open file for 6 (.fa or .fa.masked)
    !!Could not open file for 7 (.fa or .fa.masked)
    !!Could not open file for 8 (.fa or .fa.masked)
    !!Could not open file for 9 (.fa or .fa.masked)
    !!Could not open file for X (.fa or .fa.masked)
    !!Could not open file for Y (.fa or .fa.masked)

    Not removing redundant sequences


    Sequences processed:
            0 total

    Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
    Freq    Bin     Count

    Total sequences set to 50000

    Choosing background that matches in CpG/GC Content...

在/home/chipseq_project/homer/bin/assignGeneWeights.pl第63行非法除零。         装配序列文件......         使用homer2归一化低阶寡核苷酸

    Reading input files...
    0 total sequences read
    Autonormalization: 1-mers (4 total)
            A       inf%    inf%    -nan
            C       inf%    inf%    -nan
            G       inf%    inf%    -nan
            T       inf%    inf%    -nan
    Autonormalization: 2-mers (16 total)
            AA      inf%    inf%    -nan
            CA      inf%    inf%    -nan
            GA      inf%    inf%    -nan
            TA      inf%    inf%    -nan
            AC      inf%    inf%    -nan
            CC      inf%    inf%    -nan
            GC      inf%    inf%    -nan
            TC      inf%    inf%    -nan
            AG      inf%    inf%    -nan
            CG      inf%    inf%    -nan
            GG      inf%    inf%    -nan
            TG      inf%    inf%    -nan
            AT      inf%    inf%    -nan
            CT      inf%    inf%    -nan
            GT      inf%    inf%    -nan
            TT      inf%    inf%    -nan
    Autonormalization: 3-mers (64 total)
    Normalization weights can be found in file: /home/chipseq_project/homer/motif_output//seq.autonorm.tsv
    Converging on autonormalization solution:
    ...............................................................................
    Final normalization:    Autonormalization: 1-mers (4 total)
            A       inf%    inf%    -nan
            C       inf%    inf%    -nan
            G       inf%    inf%    -nan
            T       inf%    inf%    -nan
    Autonormalization: 2-mers (16 total)
            AA      inf%    inf%    -nan
            CA      inf%    inf%    -nan
            GA      inf%    inf%    -nan
            TA      inf%    inf%    -nan
            AC      inf%    inf%    -nan
            CC      inf%    inf%    -nan
            GC      inf%    inf%    -nan
            TC      inf%    inf%    -nan
            AG      inf%    inf%    -nan
            CG      inf%    inf%    -nan
            GG      inf%    inf%    -nan
            TG      inf%    inf%    -nan
            AT      inf%    inf%    -nan
            CT      inf%    inf%    -nan
            GT      inf%    inf%    -nan
            TT      inf%    inf%    -nan
    Autonormalization: 3-mers (64 total)
    Finished preparing sequence/group files

    ----------------------------------------------------------
    Known motif enrichment

    Reading input files...
    0 total sequences read
    264 motifs loaded
    Cache length = 11180
    Using binomial scoring
    Checking enrichment of 264 motif(s)
    |0%                                    50%                                  100%|

/home/chipseq_project/homer/bin/findKnownMotifs.pl第142行非法除零。         -------------------------------------------------- --------         重新发现主题(HOMER)

    Scanning input files...

!!!出了点问题......你确定你选择了合适的长度来寻找主题吗? !即检查您的序列文件!!!

    Scanning input files...

!!!出了点问题......你确定你选择了合适的长度来寻找主题吗? !即检查您的序列文件!!!

    -blen automatically set to 2
    Scanning input files...

!!!出了点问题......你确定你选择了合适的长度来寻找主题吗? !即检查你的序列文件! 在/home/chipseq_project/homer/bin/compareMotifs.pl第1289行的数字gt(>)中使用未初始化的值。         !过滤掉所有图案!!!         工作完成 - 如果结果看起来不错,请将啤酒送到..

    Cleaning up tmp files...

3 个答案:

答案 0 :(得分:1)

要检查的一件事:如果您的床文件中的染色体命名与您正在使用的基因组中的染色体命名一致:例如,您的床文件中的12号染色体不应该为'12',而在您的基因组中感兴趣的是'chr12'

答案 1 :(得分:0)

" chr"问题,简单的awk命令你的朋友。简单的awk' {print" chr" $ 0}' your.bed> your_new.bed将完成这项工作。 hkoohy

答案 2 :(得分:0)

我也遇到了这个问题,而且我的 BED 文件也很好。但是,解决它的技巧是通过以下代码将我的 .bed 文件更改为 .pos 文件:

bed2pos.pl file.bed > file.pos

我希望这对你们有帮助:)

最好的芙蓉