我是Perl的新手(正则表达式)。我需要一个例子来说明如何编写一个程序来找出多个蛋白质序列中的回文(完美)(让它成为4个序列,数量为200个氨基酸,在文件中)我必须过滤掉,回文和回顾序列中存在的回文的位置。
>TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAE
LFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWF
YNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNI
NFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH
>TRE|O06435|O06435 (492 AA) SynE [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
IFVSQRYPVSDEVYYKTIVETLNQMSLRIEGKIFIKLHPKEMENKNIMSLFLNMVTINPR
>TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
我需要在这个以及他们的位置输出完美的回文。 我已经阅读了很多文章,但无法获得更好的想法。请为我推荐一些技巧和程序。
答案 0 :(得分:1)
此挑战需要三种正则表达式功能:
perlretut - Position Information - 确定匹配项在字符串中的位置。
将这些结果放在一起得出结果:
use strict;
use warnings;
my $pp = qr/(?: (\w) (?1) \g{-1} | \w? )/ix;
local $/ = '';
while (<DATA>) {
chomp;
my ($header, @lines) = split "\n";
my $data = join '', @lines;
print "$header\n$data\n";
while ($data =~ /(?=($pp))/g) {
print "$-[0] - $1\n" if length($1) > 2;
}
}
__DATA__
>TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAE
LFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWF
YNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNI
NFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH
>TRE|O06435|O06435 (492 AA) SynE [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
IFVSQRYPVSDEVYYKTIVETLNQMSLRIEGKIFIKLHPKEMENKNIMSLFLNMVTINPR
>TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
输出:
>TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAELFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWFYNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNINFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH
6 - LKKL
29 - KNK
40 - KIK
42 - KSK
46 - IFI
66 - SRRS
86 - LIL
123 - YKY
131 - KAK
146 - IFI
164 - GTG
165 - TGT
172 - KKK
178 - NIN
211 - KEK
220 - FEF
>TRE|O06435|O06435 (492 AA) SynE [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNNLLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTIQPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNNLHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDYIFVSQRYPVSDEVYYKTIVETLNQMSLRIEGKIFIKLHPKEMENKNIMSLFLNMVTINPR
26 - FSSF
55 - KLK
70 - MKM
114 - KLLK
135 - SLLS
137 - LSL
154 - TAT
205 - NAN
220 - STS
222 - SQS
271 - KIFIK
272 - IFI
280 - EME
283 - NKN
289 - LFL
>TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNNLLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTIQPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNNLHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
26 - FSSF
55 - KLK
70 - MKM
114 - KLLK
135 - SLLS
137 - LSL
154 - TAT
205 - NAN
220 - STS
222 - SQS
答案 1 :(得分:0)
x="abaasdasdusduhfikliilkjhgjhgjhgh"
def checkpalindrome(str,i):
if len(str)>2:
rev=str[::-1]
if rev==str:
print i,":",str
i=0
for l in x:
str=""
k=i
while k < len(x):
str=str+x[k]
checkpalindrome(str,i)
k=k+1
i=i+1
这将创建所有字符串组合并将其传递给回文函数。