有人可以帮助我吗 - 我需要一个正则表达式,它只匹配包含字符ADFHKMPRTWCEGILNQSVY
和NOTHING else的行。
我需要遍历看起来像这样的文本行:
>gi|46450118|gb|AAS96767.1| femAB family protein [Desulfovibrio vulgaris str. Hildenborough]
MVDLSRKKTQALLPTDILFQTPYWAQVKTRLGMESHAFDIRSSGPWGDVLVLLRRFGRHRVAIVPQGPEV
APPHEDYGVYLESFSLALAEGLGPDVAFIRYDLPWVSPYADEMHDEGWNAFPEARLRELRMNMGTRHWNL
RKSFQDLTVASSLVVDITGEEAAVLERMKPKTRYNIGLARRKGVAVREVGRESLPQFHALYRQTAIRNGF
EPCSITHFSAMFHALCDGAGSTELLFLLATHGTDILAGCIVGLAGRTANFLYGASGNVKRNLMAPYLMHW
TAMCHARDRGCHDYEMGAVPPGHDPAHPFHGLYRFKTGFGGRVALRSGSWDYPLDHAAYRDFCNAESLYR
TDAAPGRTQ
>gi|46450117|gb|AAS96766.1| iron-sulfur protein CooF [Desulfovibrio vulgaris str. Hildenborough]
MNHEELFVIQAEAEKCRACRKCELACIASHNNLTIKEAAKKRTVFAPRVHVVKTDEVKMPVQCRQCKDAP
CARVCPTRALVQDDGVVTMRAQFCAACRLCIMACPYGAISLSFIGLPEEDEAGAMHGREVAVRCDLCSEW
RAREGKSSCACVEACPTKALHMVPLAEARGRHQ
>gi|46450116|gb|AAS96765.1| hydrogenase nickel insertion protein HypA [Desulfovibrio vulgaris str. Hildenborough]
MHEASIVAGIMRIVEEEAARHDVTRIARVRLRVGLLTGVEPRTLTACFELYSEGTVAEGASLDLETVPAL
GTCHACGATFDLHRRCFACPTCGNDDITLEGGRELTIAGLEVPQPEGATA
>gi|46450115|gb|AAS96764.1| carbon monoxide-induced hydrogenase CooH, putative [Desulfovibrio vulgaris str. Hildenborough]
MSTPDSTTQTWTLPVGPLHVALEEPMYFKLDVDGEIVRNVEITAGHVHRGMEALAMRRNLFQNIVLTERV
CSLCSNSHPFTYCMAVEHLAGIEVPARADHLRVVAEEIKRTASHLFNVAILAHIIGFKSLFMHVMEVREI
MQDIKETVYGNRMDLAANCIGGVKYDVDAELLAMLLAGLDKVERNAREIYRIYASDPMVTGRTTGIGVLP
PDEARRFGVVGPVARGSGLAVDVRRDVPYAAYPQLSFDVITEEGCDVRARALVRLREVFESISIIRQCVA
TLPEGAMTVIMPEIPAGQSVARSEAPRGELMYYLRTDGTDIPNRLKWRVPSYMNWDALGVMMRDANVADI
PLIVNSIDPCISCTER
>gi|46450114|gb|AAS96763.1| hydrogenase, CooU subunit, putative [Desulfovibrio vulgaris str. Hildenborough]
MPDNALTAPLATALDALAEAEGFTWTRDAHGNAYGWLRLAERDTLPEAARLLAEGGARLATVTAYDPVRE
PGVPRQEIAYHFDVHGTTLTVTVVLDPECPSVPSITPHFRNADWNEREFMEMYDIAVPGHPNPRRLFLDE
KLDAGIMNTIIPLSTMTNGASTQNLWERILAARPGDKA
>gi|46450113|gb|AAS96762.1| hydrogenase, CooX subunit, putative [Desulfovibrio vulgaris str. Hildenborough]
MFGFLKVLARNVLKGPSTDPFPFAEAHTPARFRGQVRLDPALCVGCAICHHVCAGGAINIAEREDGSGYD
FTVWHNTCALCGLCRHYCPTGAITLSNDWHNAHLQSQKYDWCERQFVPFMQCEGCGAHIRPLPPQLAARA
YGPGGFDFASFMRLCPSCRQLAAARADVHIPEASAMPAAPAGHADEPAIREGDATAVTVKGDETPATGVQ
Q
它们都以>开头,所以我可以寻找它。但是,我想确保我得到正确的行,所以我也想要一个与包含ADFHKMPRTWCEGILNQSVY的行相匹配的正则表达式。
干杯,
的Stefan
答案 0 :(得分:5)
像这样的东西
/^[ADFHKMPRTWCEGILNQSVY]+$/
答案 1 :(得分:2)
您只需要构造一个正则表达式,在行的开头和结尾之间允许任意数量的任何字符。这是一个示例脚本:
use strict;
use warnings;
while (<DATA>) {
if (/^[ADFHKMPRTWCEGILNQSVY]+$/) {
print $_;
}
}
__DATA__
>gi|46450118|gb|AAS96767.1| femAB family protein [Desulfovibrio vulgaris str. Hildenborough]
MVDLSRKKTQALLPTDILFQTPYWAQVKTRLGMESHAFDIRSSGPWGDVLVLLRRFGRHRVAIVPQGPEV
APPHEDYGVYLESFSLALAEGLGPDVAFIRYDLPWVSPYADEMHDEGWNAFPEARLRELRMNMGTRHWNL
RKSFQDLTVASSLVVDITGEEAAVLERMKPKTRYNIGLARRKGVAVREVGRESLPQFHALYRQTAIRNGF
EPCSITHFSAMFHALCDGAGSTELLFLLATHGTDILAGCIVGLAGRTANFLYGASGNVKRNLMAPYLMHW
TAMCHARDRGCHDYEMGAVPPGHDPAHPFHGLYRFKTGFGGRVALRSGSWDYPLDHAAYRDFCNAESLYR
TDAAPGRTQ
输出:
MVDLSRKKTQALLPTDILFQTPYWAQVKTRLGMESHAFDIRSSGPWGDVLVLLRRFGRHRVAIVPQGPEV
APPHEDYGVYLESFSLALAEGLGPDVAFIRYDLPWVSPYADEMHDEGWNAFPEARLRELRMNMGTRHWNL
RKSFQDLTVASSLVVDITGEEAAVLERMKPKTRYNIGLARRKGVAVREVGRESLPQFHALYRQTAIRNGF
EPCSITHFSAMFHALCDGAGSTELLFLLATHGTDILAGCIVGLAGRTANFLYGASGNVKRNLMAPYLMHW
TAMCHARDRGCHDYEMGAVPPGHDPAHPFHGLYRFKTGFGGRVALRSGSWDYPLDHAAYRDFCNAESLYR
TDAAPGRTQ
解构正则表达式,我们有:
^
匹配字符串的开头[ADFHKMPRTWCEGILNQSVY]
匹配方括号中的任何字符[ADFHKMPRTWCEGILNQSVY]+
表示匹配1次或更多次$
匹配字符串的结尾答案 2 :(得分:0)
我的代码:
#!/usr/bin/perl
while (<>)
{
if (/[ADFHKMPRTWCEGILNQSVY]/ and !/[0-9a-z>:;+-,.]/)
{
chomp;
for ($i = 0; $i < length($_); $i++)
{
if (substr($_,$i,1) eq "A")
{
$aminoacids{A}++;
}
elsif (substr($_,$i,1) eq "D")
{
$aminoacids{D}++;
}
elsif (substr($_,$i,1) eq "F")
{
$aminoacids{F}++;
}
elsif (substr($_,$i,1) eq "H")
{
$aminoacids{H}++;
}
elsif (substr($_,$i,1) eq "K")
{
$aminoacids{K}++;
}
elsif (substr($_,$i,1) eq "M")
{
$aminoacids{M}++;
}
elsif (substr($_,$i,1) eq "P")
{
$aminoacids{P}++;
}
elsif (substr($_,$i,1) eq "R")
{
$aminoacids{R}++;
}
elsif (substr($_,$i,1) eq "T")
{
$aminoacids{T}++;
}
elsif (substr($_,$i,1) eq "W")
{
$aminoacids{W}++;
}
elsif (substr($_,$i,1) eq "C")
{
$aminoacids{C}++;
}
elsif (substr($_,$i,1) eq "E")
{
$aminoacids{E}++;
}
elsif (substr($_,$i,1) eq "G")
{
$aminoacids{G}++;
}
elsif (substr($_,$i,1) eq "I")
{
$aminoacids{I}++;
}
elsif (substr($_,$i,1) eq "L")
{
$aminoacids{L}++;
}
elsif (substr($_,$i,1) eq "N")
{
$aminoacids{N}++;
}
elsif (substr($_,$i,1) eq "Q")
{
$aminoacids{Q}++;
}
elsif (substr($_,$i,1) eq "S")
{
$aminoacids{S}++;
}
elsif (substr($_,$i,1) eq "V")
{
$aminoacids{V}++;
}
elsif (substr($_,$i,1) eq "Y")
{
$aminoacids{Y}++;
}
else
{
print "BAD AMINO ACID $i ", substr($_,$i,1), " ", $_, "\n";
}
}
}
}
foreach $key (keys %aminoacids)
{
print "$key -> $aminoacids{$key}\n";
}