我编写了一些实现了Smith-Waterman算法的Java代码。我有一个日记条目here,就像这样说
sim (X, Y) = 2 . SLength (X, Y) / Length ( X) + Length (Y )
X and Y are the sequences for comparison.
SLength(X, Y) is the string length of the maximum matching set.
Length(X) is the number of characters in the sequence X.
Length(Y) is the number of characters in the sequence Y.
The result of this function (sim) is a real number, O<=sim<=1.
The larger SIM is, the stronger the two programs similarity is,
and the plagiarism possibility larger, vice versa.
这是我的史密斯 - 沃特曼代码
public class SmithWaterman {
private final String one, two;
private final int matrix[][];
private int gap;
private final int match;
private final int o;
private int l;
private final int e;
public SmithWaterman(String one, String two) {
this.one = "-" + one.toLowerCase();
this.two = "-" + two.toLowerCase();
this.match = 2;
// Define affine gap starting values
o = -2;
l = -1;
e = -1;
// initialize matrix to 0
matrix = new int[one.length() + 1][two.length() + 1];
for (int i = 0; i < one.length(); i++) {
for (int j = 0; j < two.length(); j++) {
matrix[i][j] = 0;
}
}
}
// returns the alignment score
/**
* @return
*/
public double computeSmithWaterman() {
for (int i = 0; i < one.length(); i++) {
for (int j = 0; j < two.length(); j++) {
gap = o + (l - 1) * e;
if (i != 0 && j != 0) {
if (one.charAt(i) == two.charAt(j)) {
// match
// reset l
l = 0;
matrix[i][j] = Math.max(0, Math.max(
matrix[i - 1][j - 1] + match, Math.max(
matrix[i - 1][j] + gap,
matrix[i][j - 1] + gap)));
} else {
// gap
l++;
matrix[i][j] = Math.max(0, Math.max(
matrix[i - 1][j - 1] + gap, Math.max(
matrix[i - 1][j] + gap,
matrix[i][j - 1] + gap)));
}
}
}
}
// find the highest value
double longest = 0;
int iL = 0, jL = 0;
for (int i = 0; i < one.length(); i++) {
for (int j = 0; j < two.length(); j++) {
if (matrix[i][j] > longest) {
longest = matrix[i][j];
iL = i;
jL = j;
}
}
}
// Backtrack to reconstruct the path
int i = iL;
int j = jL;
Stack<String> actions = new Stack<String>();
while (i != 0 && j != 0) {
// diag case
if (Math.max(matrix[i - 1][j - 1],
Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i - 1][j - 1]) {
actions.push("align");
i = i - 1;
j = j - 1;
// left case
} else if (Math.max(matrix[i - 1][j - 1],
Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i][j - 1]) {
actions.push("insert");
j = j - 1;
// up case
} else {
actions.push("delete");
i = i - 1;
}
}
int maxMatchSet = actions.size();
String alignOne = new String();
String alignTwo = new String();
Stack<String> backActions = (Stack<String>) actions.clone();
for (int z = 0; z < one.length(); z++) {
alignOne = alignOne + one.charAt(z);
if (!actions.empty()) {
String curAction = actions.pop();
if (curAction.equals("insert")) {
alignOne = alignOne + "-";
while (actions.peek().equals("insert")) {
alignOne = alignOne + "-";
actions.pop();
}
}
}
}
for (int z = 0; z < two.length(); z++) {
alignTwo = alignTwo + two.charAt(z);
if (!backActions.empty()) {
String curAction = backActions.pop();
if (curAction.equals("delete")) {
alignTwo = alignTwo + "-";
while (backActions.peek().equals("delete")) {
alignTwo = alignTwo + "-";
backActions.pop();
}
}
}
}
int minMatchSet = backActions.size();
// print alignment
double realLengthStringOne = one.length() - 1;
double realLenghtStringTwo = two.length() - 1;
double totalOfMatricesElement = realLengthStringOne + realLenghtStringTwo;
double value = (2 * maxMatchSet / totalOfMatricesElement) * 100;
System.out.println("2 * " + maxMatchSet + " / " + "( " + realLengthStringOne + " + " + realLenghtStringTwo + " ) " + "= " + value + "%");
return value;
}
public void printMatrix() {
for (int i = 0; i < one.length(); i++) {
if (i == 0) {
for (int z = 0; z < two.length(); z++) {
if (z == 0) {
System.out.print(" \t");
}
System.out.print(two.charAt(z) + " \t");
if (z == two.length() - 1) {
System.out.println();
}
}
}
for (int j = 0; j < two.length(); j++) {
if (j == 0) {
System.out.print(one.charAt(i) + " \t");
}
System.out.print(matrix[i][j] + " \t");
}
System.out.println();
}
System.out.println();
}
public static void main(String[] args) {
// DNA sequence Test:
SmithWaterman sw = new SmithWaterman("ahmad", "achmad");
System.out.println("Alignment Score: " + sw.computeSmithWaterman());
sw.printMatrix();
}
}
如果我有两个序列,如“ahmad”,“ahmad”,则输出= 100%,
但是你知道,如果我有两个像“ahmad”,“achmad”的序列,输出是这样的:
2 * 6 / ( 5.0 + 6.0 ) = 109.09090909090908%
Alignment Score: 109.09090909090908
- a c h m a d
- 0 0 0 0 0 0 0
a 0 2 1 0 0 2 1
h 0 0 0 3 2 0 0
m 0 0 0 0 5 4 2
a 0 2 1 0 2 7 6
d 0 0 0 0 0 1 9
我是否真的在实施中,我在哪里丢失了代码?
答案 0 :(得分:1)
回答你的直接问题,为什么你得到150%
您的变量longest = 6
并且您将值one
和two
分别设为'-' + one
和'-'+ two
,因此数学为2 * 6 / 8 = 12 /8 = 1.5
* 100 = 150%。
如果您使用one
和two
的原始长度,则可能会得到正确的答案。
但是,我认为您的方法可能存在缺陷:
您的变量longest
不是匹配的长度,而是矩阵中的最高分数。这是史密斯 - 沃特曼对齐得分。现在可以解决这个问题,因为你正在调整一个完美的匹配并使用+2
的匹配分数,但是我不确定这对于非完美匹配是否有效。
此值表示通过矩阵的最佳评分(部分)对齐路径。虽然这条路径通常是最长的路径,但并非必须如此。其他地方可能会有更长但更差的得分路径。
此外,您的开放差距-2
和分机-1
的匹配罚分意味着多个连续的差距将使您的匹配分数不再是偶数。
要实际查看对齐的时间长度,您必须从最高得分位置开始追踪矩阵,直到得到0
得分的矩阵中的单元格。 (因为这是Smith-Waterman,它允许与全长全局比对相对的局部比对)。
您已在actions
代码块中执行与此类似的操作。但是,您可能需要考虑插入和删除如何被视为长度的一部分。如果你想计算它们,那么最长的对齐长度只是actions.size()
答案 1 :(得分:0)
似乎你改变了最长的&#34;变量。它应该是 int ;您将其用作 double 。
如下所示:
System.out.println(12.0/8);
System.out.println(12/8);
输出将是:
1.5
1
因为变量是double的结果也会是double但是如果你使用int结果也会是double。