我正在研究一个Perl脚本,在一串核苷酸中搜索模式。到目前为止,我已经能够使用以下正则表达式
my $regex1 = qr/( ([ACGT]{2}) \2{9,} )/x;
my $regex2 = qr/( ([ACGT]{3}) \2{6,} )/x;
my $regex3 = qr/( ([ACGT]{4}) \2{6,} )/x;
for my $regex ($regex1, $regex2, $regex3) {
next unless $seq1 =~ $regex;
printf "Matched %s exactly %d times\n", $2, length($1)/length($2);
printf "Length of sequence: $number \n";
}
我将如何进行以下操作?
- 完美(重复,没有中断)和不完美(重复但可以用核苷酸打破一串重复),至少需要10次重复。
- 打印整个找到的序列
示例输入 - GTCGTGTGTGTGTAGTGTGTGTGTGTGAACTGA
完整的当前脚本
print "Di-, Tri-, Tetra-nucleotide Tandem Repeat Finder v1.0 \n\n";
print "Please specify the file location (DO NOT DRAG/DROP files!) then press ENTER:\n";
$seq = <STDIN>;
#Remove the newline from the filename
chomp $seq;
#open the file or exit
open (SEQFILE, $seq) or die "Can't open '$seq': $!";
#read the dna sequence from the file and store it into the array variable @seq1
@seq1 = <SEQFILE>;
#Close the file
close SEQFILE;
#Put the sequence into a single string as it is easier to search for the motif
$seq1 = join( '', @seq1);
#Remove whitespace
$seq1 =~s/\s//g;
#Count of number of nucleotides
#Initialize the variable
$number = 0;
$number = length $seq1;
#Use regex to say "Find 3 nucelotides and match at least 6 times
# qr(quotes and compiles)/( ([nucs]{number of nucs in pattern}) \2{number of repeats,}/x(permit within pattern)
my $regex1 = qr/( ([ACGT]{2}) \2{9,} )/x;
my $regex2 = qr/( ([ACGT]{3}) \2{6,} )/x;
my $regex3 = qr/( ([ACGT]{4}) \2{6,} )/x;
#Tell program to use $regex on variable that holds the file
for my $regex ($regex1, $regex2, $regex3) {
next unless $seq1 =~ $regex;
printf "Matched %s exactly %d times\n", $2, length($1)/length($2);
printf "Length of sequence: $number \n";
}
exit;
答案 0 :(得分:0)
我不确定我是否完全理解你需要什么,但也许这会给你一个想法:
use strict; # You should be using this,
use warnings; # and this.
my $input = 'GTCGTGTGTGTGTAGTGTGTGTGTGTGAACTGA';
my $patt = '[ACGT]{2}'; # Some pattern of interest.
my $intervene = '[ACGT]*'; # Some intervening pattern.
my $m = 7 - 2; # Minimum N of times to find pattern, less 2.
my $rgx = qr/(
($patt) $intervene
(\2 $intervene ){$m,}
\2
)/x;
print $1, "\n" if $input =~ $rgx;
另外,请参阅此问题,了解将整个文件读入字符串的更好方法:What is the best way to slurp a file into a string in Perl?。