我目前在Java中有一种BNDM搜索算法,但我想对其进行调整,以使字母“ N”与任何其他字母匹配。例如,字符串:“ NATG”应与“ CATG”匹配。我正在创建用于核苷酸匹配的软件,因此序列将仅为A,G,T,C,N,其中N是任何A,G,T,C。
例如:如果序列:“ ATGCN”和源:“ ATGATGAATGCC”。程序应返回与序列匹配的源的索引范围。在这种情况下,为7-11。另外,如果匹配多次,则应打印每个匹配项。由于源通常长一千个字符,因此我希望实现一种快速搜索算法。以下是我当前的BNDM代码,但这仅允许完全匹配。
我不确定下面的BNDM算法是否可以执行此操作。我对其他搜索算法持开放态度。
我已附上以下代码:
import java.util.Scanner;
public class BNDM {
public static void main(String[] args){
Scanner sc = new Scanner(System.in);
int sum = 5;
String source,pattern;
System.out.print("Enter sequence:");
pattern = sc.nextLine();
System.out.print("Enter source:");
source= sc.nextLine();
if (pattern.length() == source.length() && pattern.equals(source))
{
System.out.println("Sequence = Source");
}
char[] x = pattern.toCharArray(), y = source.toCharArray();
int i, j, s, d, last, m = x.length, n = y.length;
int[] b = new int[65536];
/* Pre processing */
for (i = 0; i < b.length; i++) {
b[i] = 0;
}
s = 1;
for (i = m - 1; i >= 0; i--) {
b[x[i]] |= s;
s <<= 1;
}
/* Searching phase */
j = 0;
while (j <= n - m) {
i = m - 1;
last = m;
d = ~0;
while (i >= 0 && d != 0) {
d &= b[y[j + i]];
i--;
if (d != 0) {
if (i >= 0) {
last = i + 1;
} else {
System.out.println("Sequence in Source starting at
position:");
System.out.println(j);
System.out.println("Sequence:");
System.out.println(pattern);
System.out.println("Source:");
System.out.println(source.substring(j,j+m));
}
}
d <<= 1;
}
j += last;
}
}
}
答案 0 :(得分:0)
使用正则表达式可以轻松实现这种匹配:
// remember to add these at the top:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String pattern = "ATGCN";
String nucleotides = "ATGATGAATGCC";
// first convert the pattern into a proper regex
// i.e. replacing any N with [ATCG]
Pattern regex = Pattern.compile(pattern.replaceAll("N", "[ATCG]"));
// create a Matcher to find everywhere that the pattern matches
Matcher m = regex.matcher(nucleotides);
// find all the matches
while (m.find()) {
System.out.println("Match found:");
System.out.println("start:" + m.start());
System.out.println("end:" + (m.end() - 1)); // minus 1 here because the end of a regex match is always off by 1
System.out.println();
}
答案 1 :(得分:0)
public class Match {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String origin = in.next();
String match = in.next();
Pattern pattern = Pattern.compile(match.replaceAll("N", "(A|G|T|C)"));
Matcher matcher = pattern.matcher(origin);
while (matcher.find()){
System.out.println(matcher.start() + "-" + (matcher.end() - 1));
}
}
}