在我上一个问题(https://stackoverflow.com/a/25735444/3767980)的jaypal的帮助下,我能够设定我对于暧昧和不明显案件的限制。让我们在这里考虑模糊不清,因为它更难。
我有像
那样的束缚G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB
L12N-T11C-?: (128.977, 175.109, 174.412) K158C/H60C/A152C/N127C/Y159C(notH60C)
K14N-E13C-?: (117.377, 176.474, 29.823) I187CG1/V78CG2
A75N-Q74C-?: (123.129, 177.253, 23.513) V131CG1/V135CG1/V78CG1
并遵守以下perl脚本:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
#
open my $fh, '<', $ARGV[0];
while (<$fh>) {
my @values = map { /.(\d+)(\w+)/; $1, $2 } split '/', (split)[-1];
my ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/;
print "assign (resid $resid and name $name ) (";
print join ( " or ",
map { "resid $values[$_] and name $values[$_ + 1]" }
grep { not $_ % 2 } 0 .. $#values
);
print " ) 3.5 2.5 4.5 ! $_";
}
带输出:
assign (resid 5 and name C ) (resid 87 and name CD or resid 85 and name CB or resid 94 and name CB or resid 32 and name CB or resid 21 and name CB ) 3.5 2.5 8.5 ! G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB
assign (resid 11 and name C ) (resid 158 and name C or resid 60 and name C or resid 152 and name C or resid 127 and name C or resid 159 and name C ) 3.5 2.5 8.5 ! L12N-T11C-?: (128.977, 175.109, 174.412) K158C/H60C/A152C/N127C/Y159C(notH60C)
assign (resid 13 and name C ) (resid 187 and name CG1 or resid 78 and name CG2 ) 3.5 2.5 8.5 ! K14N-E13C-?: (117.377, 176.474, 29.823) I187CG1/V78CG2
assign (resid 74 and name C ) (resid 131 and name CG1 or resid 135 and name CG2 or resid 78 and name CG1 ) 3.5 2.5 8.5 ! A75N-Q74C-?: (123.129, 177.253, 23.513) V131CG1/V135CG1/V78CG1
V
开头,后跟2或3位数,CG1
或CG2
后!
。例子是V78CG2或V135CG1。 assign (resid 5 and name C ) (resid 87 and name CD or resid 85 and name CB or resid 94 and name CB or resid 32 and name CB or resid 21 and name CB ) 3.5 2.5 8.5 ! G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB
assign (resid 11 and name C ) (resid 158 and name C or resid 60 and name C or resid 152 and name C or resid 127 and name C or resid 159 and name C ) 3.5 2.5 8.5 ! L12N-T11C-?: (128.977, 175.109, 174.412) K158C/H60C/A152C/N127C/Y159C(notH60C)
assign (resid 13 and name C ) (resid 187 and name CG1 or resid 78 and name CG* ) 3.5 2.5 8.5 ! K14N-E13C-?: (117.377, 176.474, 29.823) I187CG1/V78CG2
assign (resid 74 and name C ) (resid 131 and name CG* or resid 135 and name CG* or resid 78 and name CG* ) 3.5 2.5 8.5 ! A75N-Q74C-?: (123.129, 177.253, 23.513) V131CG1/V135CG1/V78CG1
我需要建议选择匹配的行,然后将应用的变换应用于群集输入(在!
之前)。我可以通过基本正则表达式V.*CG[1-2]
找到匹配的行。
我想要一个上述perl脚本中的解决方案。
如果有任何不清楚的地方,请发表评论。我还是比较新的。我提前感谢你的建议。
答案 0 :(得分:1)
以下是该脚本的修改版本,其中包含正在进行的操作的说明。 my @values = map { ... } split '/', (split)[-1];
有点难以理解,所以我将分别解释:
map
接受一个数组并将大括号内的任何内容应用于数组的每个成员,并输出一个新数组。这两个split
用于切断线。如果在没有任何参数的情况下使用,split
将$_
作为输入并在空格上分割。因此,第一个split
需要$_
,这是当前行,并按空格拆分:
input:
'G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB'
the array created by calling split:
'G6N-D5C-?:', '(116.663,', '177.052,', '29.149)', 'K87CD/E85CB/E94CB/H32CB/Q21CB'
第二个split
切断/
上的输入;作为输入,它使用由第一个split
创建的数组中的最后一项 - 即(split)
是“通过在空格上拆分$_
创建的数组”的简写,以及{{1是数组的最后一个元素。
(split)[-1]
map命令然后将正则表达式应用于此数组的每个成员:
input:
K87CD/E85CB/E94CB/H32CB/Q21CB
array created by calling `split "/"`
'K87CD', 'E85CB', 'E94CB', 'H32CB', 'Q21CB'
括号将结果捕获到只读变量/.(\d+)(\w+)/; # match any character (.) followed by one or more digits (\d)
# followed by one or more alphanumeric (\w) characters.
和$1
中。映射中的第二个语句将这些字符添加到由$2
命令创建的数组中。默认情况下,perl将最后一个语句的结果放入数组中,因此您可以执行以下操作:
map
(模式匹配的“结果”实际上是$ 1和$ 2,因此将my @arr = (1, 2, 3, 4);
my @two_times = map { $_ * 2 } @arr;
# @two_times is (2, 4, 6, 8)
添加到$1, $2
数组的语句@values
并非绝对必要。)
因此@values = map { /.(\d+)(\w+)/; $1, $2 } @array
会抓取@array
中每个元素的匹配项并将其放入@values
。
我希望脚本的其余部分是可以理解的;如果没有,我建议拆开每个命令并使用Data::Dumper
来检查结果,这样你就可以弄清楚发生了什么。
要更改脚本以区别对待VnnCG1 / VnnCG2条目,我在map
命令中添加了一行,找到与该模式匹配的任何残差并将其替换为VnnCG*
。然后,我更改了匹配的正则表达式,以便它可以获取残留名称的相应部分,但不会获取不适当的数据(例如(notB28DG)
)。这是带有注释的新脚本:
#!/usr/bin/perl
use strict;
use warnings;
use feature ':5.10';
use autodie;
open my $fh, '<', $ARGV[0];
while (<$fh>) {
# a brief guide to regexps:
# \d = digits
# \w = digits or letters or _
# [ ] = match any of the characters within these brackets
# ( ) = capture the value in these brackets, save it to $1, $2, $3, etc.
# (brackets are also used for alternation, but not in this case)
# * = match 0 or 1 times
# + = match 1 or more times
# \* = match the character *
# s/ / / = search and replace
# /x = ignore whitespace
my @values = map {
# find the pattern
s/V # V
(\d+) # one or more digits; the brackets mean we capture the value
# and it gets saved in $1
CG # CG
[12] # either 1 or 2
/V$1CG*/x; #replace with V $1 CG *
# find the pattern
/. # any character
(\d+) # one or more digits; capture the value in $1
([A-Z][\w\*]*) # a letter followed by zero or more alphanum or *
/x; # the value is captured in $2
# put $1 and $2 into the array we're building
$1, $2
} split '/', (split)[-1];
my ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/;
# compose the new string
my $str = "assign (resid $resid and name $name ) ("
. join ( " or ",
map { "resid $values[$_] and name $values[$_ + 1]" }
grep { not $_ % 2 } 0 .. $#values
)
. " ) 3.5 2.5 8.5 ! $_";
# "say" prints out the string to STDERR and automatically adds a carriage return
say $str;
}
没有评论的'核心'脚本的简短版本:
foreach (@data) {
my @values = map {
s/V(\d+)CG[12]/V$1CG*/; /.(\d+)([A-Z][\w\*]*)/;
} split '/', (split)[-1];
my ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/;
say "assign (resid $resid and name $name ) ("
. join ( " or ",
map { "resid $values[$_] and name $values[$_ + 1]" }
grep { not $_ % 2 } 0 .. $#values
)
. " ) 3.5 2.5 8.5 ! $_";
}