我是perl的新手,我正在尝试编写一个程序来输入pdb文件(来自Directory,我有3000个文件),输出将保存另一个目录(另一个文件夹)。
代码:
open( filehandler, "Document1.txt" ) or die $!; #Input file
my @file1 = <filehandler>;
my $OutputDir = 'C:\test_result_file';
foreach my $line (@file1) {
chomp $line;
open( fh, "$line" ) or die $!;
open( out, ">$OutputDir/$line.pdb" ) or die $!;
while ( $file = <fh> ) {
if ( $file =~ /^ATOM.{9}(?:CG|CD1|CD2B|CE1|CE2|CZ|C|O|CB|CG|CD)/ ) {
$hash{$1}{$2}++;
}
foreach $key ( sort { $hash{$1} <=> $hash{$2} or $1 cmp $2 } keys %hash ) {
print out $key;
}
}
print "Completed", "\n";
}
例如输入文件:
ATOM 1752 CG TYR A 248 89.088 39.843 51.944 1.00 32.03 C
ATOM 1753 CD1 TYR A 248 89.759 39.356 50.810 1.00 37.15 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1756 CE2 TYR A 248 87.035 39.774 50.706 1.00 35.66 C
ATOM 1757 CZ TYR A 248 87.708 39.285 49.599 1.00 35.16 C
ATOM 7394 C GLN B 331 37.664 74.934 36.854 1.00 22.75 C
ATOM 7395 O GLN B 331 37.728 73.730 36.607 1.00 31.73 O
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7398 CD GLN B 331 35.390 75.877 33.328 1.00 35.70 C
预期产量: 一条链:
ATOM 1753 CD1 TYR A 248 89.759 39.356 50.810 1.00 37.15 C
ATOM 1752 CG TYR A 248 89.088 39.843 51.944 1.00 32.03 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1753 CD1 TYR A 248 89.759 39.356 50.810 1.00 37.15 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1756 CE2 TYR A 248 87.035 39.774 50.706 1.00 35.66 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1756 CE2 TYR A 248 87.035 39.774 50.706 1.00 35.66 C
ATOM 1757 CZ TYR A 248 87.708 39.285 49.599 1.00 35.16 C
B链:
ATOM 7394 C GLN B 331 37.664 74.934 36.854 1.00 22.75 C
ATOM 7395 O GLN B 331 37.728 73.730 36.607 1.00 31.73 O
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7395 O GLN B 331 37.728 73.730 36.607 1.00 31.73 O
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7398 CD GLN B 331 35.390 75.877 33.328 1.00 35.70 C
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7398 CD GLN B 331 35.390 75.877 33.328 1.00 35.70 C
ATOM 7394 C GLN B 331 37.664 74.934 36.854 1.00 22.75 C
链ID可以是a到h。因此,规则是看到上面的预期输出:前四行将是唯一的,然后第五行将是第二行的同一行,并将新行添加为八行行。
我无法编写代码来解决这个问题,任何一个帮助
答案 0 :(得分:0)
我担心我不得不说你的代码相当令人困惑,但我从数据样本中得到的结论是:
窗口内容可以存储在一个简单的Perl数组中(请参阅下面代码段中的@window
)。您只需使用push
向其附加数据,然后在移至下一行时移除第shift
行。序列更改时,打印当前窗口并重置。在下面的示例代码中,我假设序列不混合。如果注意这种情况,您需要事先阅读所有输入并根据需要进行排序。
use strict;
use warnings;
my $win_len = 4;
my @window = ();
my $prev_chain = "";
while (<>) {
my ($atom_name, $chain) = (split)[2, 4];
next unless $atom_name =~ /\b(?:CG|CD1|CD2B|CE1|CE2|CZ|C|O|CB|CG|CD)\b/;
if ($chain eq $prev_chain) {
if (@window == $win_len) {
print_window();
shift @window;
}
push @window, $_;
} else {
print_window() if @window;
@window = ($_);
$prev_chain = $chain;
}
}
print_window() if @window;
sub print_window {
print foreach @window;
print "\n";
}
脚本从STDIN读取数据并将结果打印到STDOUT以简化操作。您的代码示例建议您存储要在Document1.txt
中处理的文件列表,并从这些文件中读取实际输入。在这种情况下,您需要一个额外的循环:
use strict;
use warnings;
my $OutputDir = 'C:/test_result_file';
open my $dir, "Document1.txt" or die "Failed to open Document1.txt:$!";
chomp(my @files = <$dir>);
foreach my $file (@files) {
my $win_len = 4;
my @window = ();
my $prev_chain = "";
open my $input, $file or die "failed to open $file: $!\n";
open my $output, '>', "$OutputDir/$file.pdb" or die "failed to open $OutputDir/$file.pdb: $!\n";
while (<$input>) {
my ($atom_name, $chain) = (split)[2, 4];
next unless $atom_name =~ /\b(?:CG|CD1|CD2B|CE1|CE2|CZ|C|O|CB|CG|CD)\b/;
if ($chain eq $prev_chain) {
if (@window == $win_len) {
print_window($output, @window);
shift @window;
}
push @window, $_;
} else {
print_window($output, @window) if @window;
@window = ($_);
$prev_chain = $chain;
}
}
print_window($output, @window) if @window;
}
sub print_window {
my $fh = shift;
print $fh $_ foreach @_;
print $fh "\n";
}