Question

我的问题的前半部分：当我尝试运行我的程序时，它会永远加载和加载;它从未显示结果。有人可以查看我的代码并在某处发现错误。该程序旨在找到起始DNA密码子ATG，并一直寻找终止密码子TAA或TAG或TGA，然后从开始到停止打印出基因。我正在使用BlueJ。

我的问题的后半部分：我应该编写一个程序，其中需要采取以下步骤：

To find the first gene, find the start codon ATG.
Next look immediately past ATG for the first occurrence of each of the three stop codons TAG, TGA, and TAA.
If the length of the substring between ATG and any of these three stop codons is a multiple of three, then a candidate for a gene is the start codon through the end of the stop codon.
If there is more than one valid candidate, the smallest such string is the gene. The gene includes the start and stop codon.
If no start codon was found, then you are done.
If a start codon was found, but no gene was found, then start searching for another gene via the next occurrence of a start codon starting immediately after the start codon that didn't yield a gene.
If a gene was found, then start searching for the next gene immediately after this found gene.

请注意，根据此算法，对于字符串“ATGCTGACCTGATAG”，ATGCTGACCTGATAG可能是一个基因，但ATGCTGACCTGA不会，即使它更短，因为首先找到的另一个'TGA'实例不是多个三分之一远离起始密码子。

在我的任务中，我也被要求制作这些方法：

具体来说，要实现该算法，您应该执行以下操作。

Write the method findStopIndex that has two parameters dna and index, where dna is a String of DNA and index is a position in the string. This method finds the first occurrence of each stop codon to the right of index. From those stop codons that are a multiple of three from index, it returns the smallest index position. It should return -1 if no stop codon was found and there is no such position. This method was discussed in one of the videos.
Write the void method printAll that has one parameter dna, a String of DNA. This method should print all the genes it finds in DNA. This method should repeatedly look for a gene, and if it finds one, print it and then look for another gene. This method should call findStopIndex. This method was also discussed in one of the videos.
Write the void method testFinder that will use the two small DNA example strings shown below. For each string, it should print the string, and then print the genes found in the string. Here is sample output that includes the two DNA strings:

示例输出为：

ATGAAATGAAAA

发现基因是：

ATGAAATGA

DNA字符串是：

ccatgccctaataaatgtctgtaatgtaga

找到的基因是：

atgccctaa

atgtctgtaatgtag

DNA字符串是：

CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA

找到的基因是：

ATGTAA

ATGAATGACTGATAG

ATGCTATGA

ATGTGA

我已经考虑过了，发现这段代码接近正常工作。我只需要输出就可以产生说明中要求的结果。希望这不是太乱，我只是不知道如何在起始密码子后寻找终止密码子，然后我如何抓住基因序列。我也希望通过找到三个标签中的哪一个（tag，tga，taa）更接近atg来了解如何获得最接近的基因序列。我知道这很多但希望这一切都有道理。

import edu.duke.*;
import java.io.*;

public class FindMultiGenes {
    public String findGenes(String dnaOri) {
        String gene = new String();
        String dna = dnaOri.toLowerCase();
        int start = -1;
        while(true){
            start = dna.indexOf("atg", start);
            if (start == -1) {
                break;
            }
            int stop = findStopCodon(dna, start); 
            if(stop > start){
                String currGene = dnaOri.substring(start, stop+3);

                System.out.println("From: " + start + " to " + stop + "Gene: "    
                +currGene);}
        }
        return gene;
    } 

    private int findStopCodon(String dna, int start){   
        for(int i = start + 3; i<dna.length()-3; i += 3){
            String currFrameString = dna.substring(i, i+3);

            if(currFrameString.equals("TAG")){
                return i;

            } else if( currFrameString.equals("TGA")){
                return i;

            } else if( currFrameString.equals("TAA")){
                return i;

            }
        }   
        return -1;
    }

    public void testing(){


        FindMultiGenes FMG = new FindMultiGenes();

        String dna =     
        "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";

        FMG.findGenes(dna);




        System.out.println("DNA string is: " + dna);

    } 
}

Answer 1

将您的专线start = dna.indexOf("atg", start);更改为

start = dna.indexOf("atg", start + 1);

目前发生的情况是您在索引"atg"找到k，然后在下一次搜索中从"atg"开始搜索下一个k的字符串。由于起始位置是包含的，所以在完全相同的位置找到下一个匹配。因此，您将一遍又一遍地找到相同的索引k，并且永远不会停止。

通过将索引增加1，您跳过当前找到的索引k并开始从k+1开始搜索下一个匹配项。

Answer 2

该程序旨在找到一个起始DNA密码子ATG，并一直寻找终止密码子TAA或TAG或TGA，然后从开始到结束打印出基因。

由于第一次搜索始终从0开始，您可以在那里设置起始索引，然后从结果中搜索终止密码子。在这里，我用1个终止密码子做到了：

public static void main(String[] args) {

    String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
    String sequence = dna.toLowerCase();
    int index = 0;
    int newIndex = 0;
    while (true) {
        index = sequence.indexOf("atg", index);
        if (index == -1)
            return;
        newIndex = sequence.indexOf("tag", index + 3);
        if (newIndex == -1) // Check needed only if a stop codon is not guaranteed for each start codon.
            return;
        System.out.println("From " + (index + 3) + " to " + newIndex + " Gene: " + sequence.substring(index + 3, newIndex));
        index = newIndex + 3;
    }
}

输出：

From 4 to 7 Gene: taa
From 13 to 22 Gene: aatgactga

此外，您可以使用正则表达式为您完成大量工作：

public static void main(String[] args) {

    String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";

    Pattern p = Pattern.compile("ATG([ATGC]+?)TAG");
    Matcher m = p.matcher(dna);

    while (m.find())
        System.out.println("From " + m.start(1) + " to " + m.end(1) + " Gene: " + m.group(1));
}

输出：

From 4 to 7 Gene: TAA
From 13 to 22 Gene: AATGACTGA

Java程序故障

2 个答案: