如何在perl

时间:2016-10-11 08:10:51

标签: perl bioinformatics

我正在尝试仅打印预定义的序列(ATOM名称)但未获得预期的输出。我想按照以下预期输出打印输入文件。链ID可以是A到H.

代码:

my $OutputDir = 'C:\test_result_file';
open my $dir, "Document1.txt" or die "Failed to open Document1.txt:$!";
chomp(my @files = <$dir>);

foreach my $file (@files) {
my $win_len = 4;
my @window = ();
my $prev_chain = "";

open my $input, $file or die "failed to open $file: $!\n";
open my $output, '>', "$OutputDir/$file" or die "failed to open $OutputDir/$file.pdb: $!\n";
while (<$input>) {
    my ($atom_name, $chain) = (split)[2, 4];
    next unless $atom_name =~ /\b(?:C4B|O4B|C1B|C2B|O4B|C1B|C2B|C3B|C1B|C2B|C3B|C4B|C2B|C3B|C4B|O4B|C3B|C4B|O4B|C1B)\b/;
    if ($chain eq $prev_chain) {
        if (@window == $win_len) {
            print_window($output, @window);
            shift @window;
        }
        push @window, $_;
    } else {
        print_window($output, @window) if @window;
        @window = ($_);
        $prev_chain = $chain;
    }
}
print_window($output, @window) if @window;

}

sub print_window {
my $fh = shift;
print $fh $_ foreach @_;
print $fh "\n";
}

输入文件:

HETATM10910  C4B NAD A 363      60.856 -58.575 149.282  1.00 40.44           C  
HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O  
HETATM10912  C3B NAD A 363      60.243 -57.426 148.473  1.00 40.37           C  
HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C  
HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C  
HETATM10954  C4B NAD B 363      41.496 -54.407 140.932  1.00 39.26           C  
HETATM10955  O4B NAD B 363      41.936 -54.715 139.568  1.00 41.96           O  
HETATM10956  C3B NAD B 363      42.061 -55.476 141.894  1.00 37.13           C  
HETATM10958  C2B NAD B 363      42.883 -56.336 140.942  1.00 38.13           C  
HETATM10960  C1B NAD B 363      42.233 -56.127 139.593  1.00 42.92           C 

预期产出:

链条:

HETATM10910  C4B NAD A 363      60.856 -58.575 149.282  1.00 40.44           C  
HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O  
HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C  
HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C 

HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O  
HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C  
HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C 
HETATM10912  C3B NAD A 363      60.243 -57.426 148.473  1.00 40.37           C   

HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C  
HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C 
HETATM10912  C3B NAD A 363      60.243 -57.426 148.473  1.00 40.37           C   
HETATM10910  C4B NAD A 363      60.856 -58.575 149.282  1.00 40.44           C 

HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C 
HETATM10912  C3B NAD A 363      60.243 -57.426 148.473  1.00 40.37           C   
HETATM10910  C4B NAD A 363      60.856 -58.575 149.282  1.00 40.44           C 
HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O  

HETATM10912  C3B NAD A 363      60.243 -57.426 148.473  1.00 40.37           C   
HETATM10910  C4B NAD A 363      60.856 -58.575 149.282  1.00 40.44           C 
HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O
HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C    

B链:

HETATM10954  C4B NAD B 363      41.496 -54.407 140.932  1.00 39.26           C  
HETATM10955  O4B NAD B 363      41.936 -54.715 139.568  1.00 41.96           O    
HETATM10960  C1B NAD B 363      42.233 -56.127 139.593  1.00 42.92           C 
HETATM10958  C2B NAD B 363      42.883 -56.336 140.942  1.00 38.13           C

HETATM10955  O4B NAD B 363      41.936 -54.715 139.568  1.00 41.96           O    
HETATM10960  C1B NAD B 363      42.233 -56.127 139.593  1.00 42.92           C 
HETATM10958  C2B NAD B 363      42.883 -56.336 140.942  1.00 38.13           C
HETATM10956  C3B NAD B 363      42.061 -55.476 141.894  1.00 37.13           C  

HETATM10960  C1B NAD B 363      42.233 -56.127 139.593  1.00 42.92           C 
HETATM10958  C2B NAD B 363      42.883 -56.336 140.942  1.00 38.13           C
HETATM10956  C3B NAD B 363      42.061 -55.476 141.894  1.00 37.13           C  
HETATM10954  C4B NAD B 363      41.496 -54.407 140.932  1.00 39.26           C  

HETATM10958  C2B NAD B 363      42.883 -56.336 140.942  1.00 38.13           C
HETATM10956  C3B NAD B 363      42.061 -55.476 141.894  1.00 37.13           C  
HETATM10954  C4B NAD B 363      41.496 -54.407 140.932  1.00 39.26           C  
HETATM10955  O4B NAD B 363      41.936 -54.715 139.568  1.00 41.96           O    

HETATM10956  C3B NAD B 363      42.061 -55.476 141.894  1.00 37.13           C  
HETATM10954  C4B NAD B 363      41.496 -54.407 140.932  1.00 39.26           C  
HETATM10955  O4B NAD B 363      41.936 -54.715 139.568  1.00 41.96           O    
HETATM10960  C1B NAD B 363      42.233 -56.127 139.593  1.00 42.92           C 

描述:我想对HETATM预定义的ATOM名称进行排序(例如:C4B,O4B,C1B,C2B等)。到目前为止我有上面的脚本。所以请任何人帮我解决这个问题。在我当前的脚本中,我得到相同的格式但无法获得预期的结果。

我不想要A链和B链或任何链ID的单独文件。我想根据我的序列(预定义)对ATOM名称进行排序。

我的序列是:

C4B-O4B-C1B-C2B
O4B-C1B-C2B-C3B
C1B-C2B-C3B-C4B
C2B-C3B-C4B-O4B
C3B-C4B-O4B-C1B

e.g., first row: C4B
HETATM10910  C4B NAD A 363      60.856 -58.575 149.282  1.00 40.44           C  

Second row: O4B
HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O  
Third Row: C1B
HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C
Fourth Row: C2B  
HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C 
Fifth Row: O4B
HETATM10911  O4B NAD A 363      61.320 -59.488 148.275  1.00 43.48           O 
Sixth Row: C1B
HETATM10916  C1B NAD A 363      61.394 -58.766 147.056  1.00 43.29           C  
Seventh Row: C2B
HETATM10914  C2B NAD A 363      60.167 -57.970 147.054  1.00 40.90           C 
Eighth Row: C3B
HETATM10912  C3B NAD A 363      60.243 -57.426 148.473  1.00 40.37           C   
.
.
.
so on

B和其他链的格式也相同。

这意味着我需要多次每行。所有关闭原子名称应该在输入文件和链方式。我们需要复制以上所有原子名称文件,然后我们需要按照上面的顺序粘贴。

1 个答案:

答案 0 :(得分:0)

在我看来,你的错误来自这一行:

my ($atom_name, $chain) = (split)[2, 4];

这会将第3列放在$atom_name中,将第5列放在$chain中。

我想你想要:

my ($atom_name, $chain) = (split)[1, 3];

您将获得第一行:

$atom_name = C4B$chain = B