Question

我正在尝试调试我的程序以查找错误;例如，当我尝试运行我的代码时，它只打印出DNA字符串而不是打印出基因序列。问题区域是printAll方法的while语句。我需要在while循环中调用findStopIndex方法。但是我想知道为什么当我跑它时我会空着。任何见解将不胜感激。

public class FindMultiGenes4 {
public
int
 findStopIndex(String dna, int index){
     int stop1 = dna.indexOf("tga", index);
     if (stop1 == -1 || (stop1-index) % 3 != 0){
         stop1 = dna.length();
        }
        int stop2 = dna.indexOf("taa", index);
        if (stop2 == -1 || (stop2-index) % 3 != 0){
            stop2 = dna.length();
        }
        int stop3 = dna.indexOf("tag", index);
        if (stop3 == -1 || (stop3-index) % 3 != 0){
            stop3 = dna.length();
        }
        return Math.min(stop1, Math.min(stop2,stop3));
}
    public void printAll(String dna) {

    dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
    String sequence = dna.toLowerCase();
    int index = 0;
    int newIndex = 0;



    while (true) {
        index = sequence.indexOf("atg", index);
        if (index == -1)
            break;
        if (newIndex == -1) // Check needed only if a stop codon is not guaranteed for each start codon.
            break;
        int stop = findStopIndex(dna, index);
        if (stop != sequence.length()){
            System.out.println("From " + (index  ) + " to " + newIndex+3 + " Gene: " + sequence.substring(index, stop+3));
            index = sequence.substring(index, stop + 3).length();
        }
        else {index = index+3;
        }





    }
}
public void testFinder(){


        FindMultiGenes4 FMG = new FindMultiGenes4();

        String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";

        FMG.printAll(dna);




            System.out.println("DNA: "+dna);
        }
}

Answer 1

问题出在以下一行

int stop = findStopIndex(dna, index);

dna是大写字符串，其中findStopIndex检查小写基因序列tga, taa, tag。

代码还有一些其他小问题，我希望已经纠正，请参阅下面的修改代码

public class FindMultiGenes4 {
    private static final String GENE_PREFIX = "ATG";
    private static final String[] GENE_SUFFIXES = {"TGA", "TAA", "TAG"};

    public int findStopIndex(String dna, int index) {
        int minStop = dna.length();
        for(String suffix : GENE_SUFFIXES) {
            int stop = -1;
            int localIndex = index;
            do{//repeating if the match found is not multiple of 3
                stop = dna.indexOf(suffix, localIndex);
                if(stop == -1) {
                    stop = dna.length();
                    break;
                }
                localIndex = stop + 3;
            } while( (stop - index) % 3 != 0);

            if(minStop > stop) {
                minStop = stop;
            }
        }
        return minStop + 3;
    }

    public void printAll(String dna) {
        String localDna = dna.toUpperCase();
        int index = 0;
        while(index != -1 && index + 3 < localDna.length()) {
            index = localDna.indexOf(GENE_PREFIX, index);
            if(index == -1) {
                break;
            }
            int stop = findStopIndex(localDna, index + 3);
            if(stop < dna.length()) {
                System.out.println("From " + (index) + " to " + stop
                        + " Gene: " + dna.substring(index, stop));
            }
            index = stop;
        }
    }

    public static void main(String[] args) {

        FindMultiGenes4 FMG = new FindMultiGenes4();

        String[] dnaSamples = {"CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA",
                "catgtaatagatgaatgactgatagatatgcttgtatgctatgaaaatgtgaaatgaccca",
                "cAtGtAaTaGaTgAaTgAcTgAtAgAtAtGcTtGtAtGcTaTgAaAaTgTgAaAtGaCcCa",
                "ATGAAATGAAAA",
                "ccatgccctaataaatgtctgtaatgtaga"};

        for(String dna : dnaSamples) {
            System.out.println("DNA: " + dna);
            FMG.printAll(dna);
            System.out.println("");
        }
    }
}

<强>输出

DNA: CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA
From 1 to 7 Gene: ATGTAA
From 10 to 22 Gene: ATGAATGACTGA
From 27 to 57 Gene: ATGCTTGTATGCTATGAAAATGTGAAATGA

DNA: catgtaatagatgaatgactgatagatatgcttgtatgctatgaaaatgtgaaatgaccca
From 1 to 7 Gene: atgtaa
From 10 to 22 Gene: atgaatgactga
From 27 to 57 Gene: atgcttgtatgctatgaaaatgtgaaatga

DNA: cAtGtAaTaGaTgAaTgAcTgAtAgAtAtGcTtGtAtGcTaTgAaAaTgTgAaAtGaCcCa
From 1 to 7 Gene: AtGtAa
From 10 to 22 Gene: aTgAaTgAcTgA
From 27 to 57 Gene: AtGcTtGtAtGcTaTgAaAaTgTgAaAtGa

DNA: ATGAAATGAAAA
From 0 to 9 Gene: ATGAAATGA

DNA: ccatgccctaataaatgtctgtaatgtaga
From 2 to 11 Gene: atgccctaa
From 14 to 29 Gene: atgtctgtaatgtag

我使用下面的正则表达式实现了相同的算法，事实证明，这比上面简单。

使用正则表达式

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FindMultiGenes5 {
    /*(?i) : Case insensitive match
     * ATG : Starts with ATG
     * (\\w{3})*? : smallest string with length multiple of 3
     * (TGA|TAA|TAG) : one of TAG, TAA or TAG
     */
    private static final String GENE_REGEX = "(?i)ATG(\\w{3})*?(TGA|TAA|TAG)";

    public void regexMatch(String dna) {
        Matcher matcher = Pattern.compile(GENE_REGEX).matcher(dna);
        while(matcher.find()) {
            System.out.println("From " + matcher.start() + " to " + matcher.end() + " Gene: " + matcher.group());
        }
    }

    public static void main(String[] args) {

        FindMultiGenes5 FMG = new FindMultiGenes5();

        String[] dnaSamples = {"CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA",
                "catgtaatagatgaatgactgatagatatgcttgtatgctatgaaaatgtgaaatgaccca",
                "cAtGtAaTaGaTgAaTgAcTgAtAgAtAtGcTtGtAtGcTaTgAaAaTgTgAaAtGaCcCa",
                "ATGAAATGAAAA",
                "ccatgccctaataaatgtctgtaatgtaga"};
        /*String[] dnaSamples = {"ATGaaabbbATGTGATAATGA".toLowerCase()};*/

        for(String dna : dnaSamples) {
            System.out.println("DNA: " + dna);
            FMG.regexMatch(dna);
            System.out.println("");
        }
    }
}

正则表达式输出

DNA: CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA
From 1 to 7 Gene: ATGTAA
From 10 to 22 Gene: ATGAATGACTGA
From 27 to 57 Gene: ATGCTTGTATGCTATGAAAATGTGAAATGA

DNA: catgtaatagatgaatgactgatagatatgcttgtatgctatgaaaatgtgaaatgaccca
From 1 to 7 Gene: atgtaa
From 10 to 22 Gene: atgaatgactga
From 27 to 57 Gene: atgcttgtatgctatgaaaatgtgaaatga

DNA: cAtGtAaTaGaTgAaTgAcTgAtAgAtAtGcTtGtAtGcTaTgAaAaTgTgAaAtGaCcCa
From 1 to 7 Gene: AtGtAa
From 10 to 22 Gene: aTgAaTgAcTgA
From 27 to 57 Gene: AtGcTtGtAtGcTaTgAaAaTgTgAaAtGa

DNA: ATGAAATGAAAA
From 0 to 9 Gene: ATGAAATGA

DNA: ccatgccctaataaatgtctgtaatgtaga
From 2 to 11 Gene: atgccctaa
From 14 to 29 Gene: atgtctgtaatgtag

Answer 2

好的，你有/仍然有几个问题。因为我不知道最终目标，所以我无法提供太多帮助。我不知道算法的规则。

然而，我做了一些事情，最终似乎有效：首先，序列必须作为参数而不是dna在行中发送：

int stop = findStopIndex(dna, index);

变为

int stop = findStopIndex(sequence, index);

然后您会发现newIndex变量没有做太多事情。该值始终保持为0.此外，检查-1无关紧要。我将输出中的值更改为(stop + 3)。还要注意括号。没有它，它将被解释为字符串。

其他一些不错的改进包括将您的值添加为变量而不是硬编码：

private final String[] STOP_SEQUENCES = {"tga", "taa", "tag"};
private final String START_SEQ = "atg";

作为一般规则，请尽量避免重复代码。在findStopIndex(String dna, int index)代码中，重复变量3次。这很好，直到有更多的变量。什么是50 000个停止代码？

因此可以将该方法拆分并使其更加通用：

public int findStopIndex(String dna, int index) {
    int minStop = dna.length();
    int prevStop = dna.length();
    for (String stopSeq : STOP_SEQUENCES) {
        int stop = dna.indexOf(stopSeq, index);
        if (!hasStop(stop, index)) {
            stop = dna.length();
        }
        int tempMinStop = Math.min(stop, prevStop);
        minStop = minStop > tempMinStop ? tempMinStop : minStop;
        prevStop = stop;
    }
    return minStop;
}

public boolean hasStop(int stop, int index) {
    if (stop == -1 || (stop - index) % 3 != 0) {
        return false;
    }
    return true;
}

printAll(String dna)方法：

public void printAll(String dna) {
    String sequence = dna.toLowerCase();
    int index = 0;
    while (true) {
        index = sequence.indexOf(START_SEQ, index);
        if (index == -1) {
            break;
        }
        int stop = findStopIndex(sequence, index);
        if (stop != sequence.length()) {
            System.out.println("From " + (index) + " to " + (stop + 3) + " Gene: " + sequence.substring(index, stop + 3));
            index = stop;
        } else {
            index = index + 3;
        }
    }
}

请注意所做的更改：

index = sequence.substring(index, stop + 3).length();

现在是

index = stop;

避免无限循环。

这可以通过内置的调试工具轻松调试。一个合适的Java IDE应该有一个调试器。有关更多信息，请查看使用IDE调试，例如，以下是如何使用Netbeans进行调试：Netbeans Debugging

这是Eclipse上的一个： Eclipse Debugging

除此之外，虽然这是一个小程序，但在解决意外输出时，打印或记录程序某些区域的值也会有很大帮助。

调试我失败的代码

2 个答案: