计算Smith-Waterman超过100%

时间:2014-07-11 16:59:18

标签: java algorithm bioinformatics

我编写了一些实现了Smith-Waterman算法的Java代码。我有一个日记条目here,就像这样说

sim  (X, Y)  = 2 . SLength  (X, Y) / Length  ( X)  +  Length  (Y  ) 


X  and  Y are  the  sequences  for  comparison.  
SLength(X, Y) is  the  string length of the  maximum matching set.  
Length(X) is the  number  of  characters  in  the  sequence X.
Length(Y)  is  the     number  of  characters  in  the  sequence  Y.  
The  result  of  this function (sim)  is a real  number,  O<=sim<=1.  
The larger  SIM is, the  stronger  the  two  programs  similarity  is,
and  the  plagiarism possibility larger,  vice versa.

这是我的史密斯 - 沃特曼代码

public class SmithWaterman {

private final String one, two;
private final int matrix[][];
private int gap;
private final int match;
private final int o;
private int l;
private final int e;

public SmithWaterman(String one, String two) {
    this.one = "-" + one.toLowerCase();
    this.two = "-" + two.toLowerCase();
    this.match = 2;

    // Define affine gap starting values
    o = -2;
    l = -1;
    e = -1;

    // initialize matrix to 0
    matrix = new int[one.length() + 1][two.length() + 1];
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            matrix[i][j] = 0;
        }
    }

}

// returns the alignment score
/**
 * @return
 */
public double computeSmithWaterman() {
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            gap = o + (l - 1) * e;
            if (i != 0 && j != 0) {
                if (one.charAt(i) == two.charAt(j)) {
                    // match
                    // reset l
                    l = 0;
                    matrix[i][j] = Math.max(0, Math.max(
                            matrix[i - 1][j - 1] + match, Math.max(
                                    matrix[i - 1][j] + gap,
                                    matrix[i][j - 1] + gap)));
                } else {
                    // gap
                    l++;
                    matrix[i][j] = Math.max(0, Math.max(
                            matrix[i - 1][j - 1] + gap, Math.max(
                                    matrix[i - 1][j] + gap,
                                    matrix[i][j - 1] + gap)));
                }
            }
        }
    }

    // find the highest value
    double longest = 0;
    int iL = 0, jL = 0;
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            if (matrix[i][j] > longest) {
                longest = matrix[i][j];
                iL = i;
                jL = j;
            }
        }
    }

    // Backtrack to reconstruct the path
    int i = iL;
    int j = jL;
    Stack<String> actions = new Stack<String>();

    while (i != 0 && j != 0) {
        // diag case
        if (Math.max(matrix[i - 1][j - 1],
                Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i - 1][j - 1]) {
            actions.push("align");
            i = i - 1;
            j = j - 1;
            // left case
        } else if (Math.max(matrix[i - 1][j - 1],
                Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i][j - 1]) {
            actions.push("insert");
            j = j - 1;
            // up case
        } else {
            actions.push("delete");
            i = i - 1;
        }
    }

    int maxMatchSet = actions.size();

    String alignOne = new String();
    String alignTwo = new String();

    Stack<String> backActions = (Stack<String>) actions.clone();
    for (int z = 0; z < one.length(); z++) {
        alignOne = alignOne + one.charAt(z);
        if (!actions.empty()) {
            String curAction = actions.pop();

            if (curAction.equals("insert")) {
                alignOne = alignOne + "-";
                while (actions.peek().equals("insert")) {
                    alignOne = alignOne + "-";
                    actions.pop();
                }
            }
        }
    }

    for (int z = 0; z < two.length(); z++) {
        alignTwo = alignTwo + two.charAt(z);
        if (!backActions.empty()) {
            String curAction = backActions.pop();
            if (curAction.equals("delete")) {
                alignTwo = alignTwo + "-";
                while (backActions.peek().equals("delete")) {
                    alignTwo = alignTwo + "-";
                    backActions.pop();
                }
            }
        }
    }
    int minMatchSet = backActions.size();

    // print alignment
    double realLengthStringOne = one.length() - 1;
    double realLenghtStringTwo = two.length() - 1;
    double totalOfMatricesElement = realLengthStringOne + realLenghtStringTwo;

    double value = (2 * maxMatchSet / totalOfMatricesElement) * 100;

    System.out.println("2 * " + maxMatchSet + " / " + "( " + realLengthStringOne + " + " + realLenghtStringTwo + " ) " + "= " + value + "%");


    return value;
}

public void printMatrix() {

    for (int i = 0; i < one.length(); i++) {
        if (i == 0) {
            for (int z = 0; z < two.length(); z++) {
                if (z == 0) {
                    System.out.print("  \t");
                }
                System.out.print(two.charAt(z) + " \t");

                if (z == two.length() - 1) {
                    System.out.println();
                }
            }
        }

        for (int j = 0; j < two.length(); j++) {
            if (j == 0) {
                System.out.print(one.charAt(i) + " \t");
            }
            System.out.print(matrix[i][j] + " \t");
        }
        System.out.println();
    }
    System.out.println();
}

public static void main(String[] args) {
    // DNA sequence Test:

    SmithWaterman sw = new SmithWaterman("ahmad", "achmad");
    System.out.println("Alignment Score: " + sw.computeSmithWaterman());

    sw.printMatrix();

}
}

如果我有两个序列,如“ahmad”,“ahmad”,则输出= 100%,

但是你知道,如果我有两个像“ahmad”,“achmad”的序列,输出是这样的:

2 * 6 / ( 5.0 + 6.0 ) = 109.09090909090908%

Alignment Score: 109.09090909090908
  -     a   c   h   m   a   d   
-   0   0   0   0   0   0   0   
a   0   2   1   0   0   2   1   
h   0   0   0   3   2   0   0   
m   0   0   0   0   5   4   2   
a   0   2   1   0   2   7   6   
d   0   0   0   0   0   1   9   

我是否真的在实施中,我在哪里丢失了代码?

2 个答案:

答案 0 :(得分:1)

回答你的直接问题,为什么你得到150%

您的变量longest = 6并且您将值onetwo分别设为'-' + one'-'+ two,因此数学为2 * 6 / 8 = 12 /8 = 1.5 * 100 = 150%。

如果您使用onetwo的原始长度,则可能会得到正确的答案。

但是,我认为您的方法可能存在缺陷:

您的变量longest不是匹配的长度,而是矩阵中的最高分数。这是史密斯 - 沃特曼对齐得分。现在可以解决这个问题,因为你正在调整一个完美的匹配并使用+2的匹配分数,但是我不确定这对于非完美匹配是否有效。

此值表示通过矩阵的最佳评分(部分)对齐路径。虽然这条路径通常是最长的路径,但并非必须如此。其他地方可能会有更长但更差的得分路径。

此外,您的开放差距-2和分机-1的匹配罚分意味着多个连续的差距将使您的匹配分数不再是偶数。

要实际查看对齐的时间长度,您必须从最高得分位置开始追踪矩阵,直到得到0得分的矩阵中的单元格。 (因为这是Smith-Waterman,它允许与全长全局比对相对的局部比对)。

您已在actions代码块中执行与此类似的操作。但是,您可能需要考虑插入和删除如何被视为长度的一部分。如果你想计算它们,那么最长的对齐长度只是actions.size()

答案 1 :(得分:0)

似乎你改变了最长的&#34;变量。它应该是 int ;您将其用作 double

如下所示:

System.out.println(12.0/8);
System.out.println(12/8);

输出将是:

 1.5  
 1

因为变量是double的结果也会是double但是如果你使用int结果也会是double。