我在这里得到了很好的投入,对一串核苷酸进行了搜索,寻找3个核苷酸的重复模式,并要求通过构造正则表达式连续7次重复发生。
my $regex1 = qr/( ([ACGT]{3}) \2{6,} )/x;
我知道如何扩展它以连续搜索10个行中的2个核心以及7个行中的4个核心。
但我想扩展代码,以便用户可以指向他们的输入文件,并检查上面的正则表达式以及我需要为其他两个搜索创建的另外两个正则表达式。
编辑:如何将输入文件置于多个正则表达式之上?我在代码中创建了另外两个正则表达式(被哈希符号击败)
这是我目前的代码
print "Please specify the file location (DO NOT DRAG/DROP files!) then press ENTER:\n";
$seq = <STDIN>;
#Remove the newline from the filename
chomp $seq;
#open the file or exit
open (SEQFILE, $seq) or die "Can't open '$seq': $!";
#read the dna sequence from the file and store it into the array variable @seq1
@seq1 = <SEQFILE>;
#Close the file
close SEQFILE;
#Put the sequence into a single string as it is easier to search for the motif
$seq1 = join( '', @seq1);
#Remove whitespace
$seq1 =~s/\s//g;
#Count of number of nucleotides
#Initialize the variable
$number = 0;
$number = length $seq1;
#Use regex to say "Find 3 nucelotides and match at least 6 times
# qr(quotes and compiles)/( ([nucs]{number of nucs in pattern}) \2{number of repeats,}/x(permit within pattern)
my $regex1 = qr/( ([ACGT]{3}) \2{6,} )/x;
#my $regex = qr/( ([ACGT]){2}) \2{9,} )/x;
#my $regex2 = qr/( ([ACGT]{4}) \2{6,} )/x;
#Tell program to use $regex on variable that holds the file
$seq1 =~ $regex1;
#Now print the results to screen
#This will need to change to printing to a file (WHAT KIND OF FILE?)in the following manner :site, nucelotide match, # of times, length of full sequence
printf "MATCHED %s exactly %d times\n", $2, length($1)/3;
print "Length of sequence: $number\n";
exit;
答案 0 :(得分:1)
只需使用for
循环即可。像
for my $regex ($regex1, $regex2, $regex3) {
next unless $seq1 =~ $regex;
printf "MATCHED %s exactly %d times\n", $2, length($1)/length($2);
}
但您可能希望更改输出以更好地描述结果。