我正在尝试计算数组中相同元素重复次数的次数,如果它重复更多时间,那么我想将整行打印到不同的文件中。
我的意见:
ATM 4387 FE HEM A 142
ATM 4388 CHA HEM A 142
ATM 4389 CHB HEM A 142
ATM 4431 CHA HEM B 147
ATM 4432 CHB HEM B 147
ATM 4433 CHC HEM B 147
ATM 4434 CHD HEM B 147
ATM 4559 O HOH A 156
ATM 4560 O HOH A 159
所以我将元素[3]
,[4]
和[5]
放入一个单独的数组中,计算其外观的数量并设置条件,如果它出现>1
那么将它们打印成单独的文件。该脚本的另一部分是匹配数组@lig
(ligands.txt file
)中的元素和@ligands_pdb
数组中的元素。如果它似乎匹配,则@ligands_pdb
中的元素也应包含在文件名中。
我的@lig
数组如下所示:
HEC
HEM
HEP
IGP
IPM
LLP
因为HEM
匹配,所以这也应该包含在文件名中。
我得到的当前错误是Use of uninitialized value $ligands_pdb in concatenation (.) or string at example.pl line 58, <$_[...]> line 5436.
#! usr/bin/env perl
use strict;
use warnings;
use autodie;
use 5.010;
use Data::Dumper;
my $data;
my $ligands_pdb;
my @ligands_pdb;
my $ligand_file = 'ligands.txt';
open (LIG, $ligand_file)or die "Cannot open $ligand_file, $!";
my @lig= <LIG>;
close LIG;
#print "@lig\n";
my $flag = 0;
for my $pdb ( glob '*pdb' )
{
#printf "# %s\n", $pdb;
open my $fh, "<", $pdb;
for my $line ( <$fh> )
{
chomp( $line );
if ( $line =~ m/^ATM / )
{
my @cols = split ' ', $line;
#print @cols;
#print "$cols[3]\n";
push @ligands_pdb, $cols[3];
my ($chain_id, $res_no) = ( $cols[4], $cols[5] );
defined $res_no
or die "Unable to grok line: $line";
push @{ $data->{$chain_id}->{$res_no} }, $line;
}
foreach (@ligands_pdb)
{
if ("@lig" =~/$_/ )
{
$flag = 0;
}
else
{
$flag = 1;
}
for my $chain_id ( keys %$data )
{
for my $res_no ( keys %{ $data->{$chain_id} } )
{
#print "$chain_id\n";
#print "$res_no\n";
my @lines = @{ $data->{$chain_id}->{$res_no} };
if ( $flag ==0 and scalar @lines > 1 )
{
open my $out, ">> $ligands_pdb . '#' . $chain_id . '#' . $res_no . '.txt';"; #line 58
print $out $_ for (@lines);
close $out;
}
@ligands_pdb = ();
}
}
}
}
}
我希望创建2个文件,其内容为:
HEM#A#142:
ATM 4387 FE HEM A 142
ATM 4388 CHA HEM A 142
ATM 4389 CHB HEM A 142
HEM#B#147:
ATM 4431 CHA HEM B 147
ATM 4432 CHB HEM B 147
ATM 4433 CHC HEM B 147
ATM 4434 CHD HEM B 147
答案 0 :(得分:1)
我会使用嵌套哈希重写您的代码来存储文件行,键入2个字段。如果存储了多行,则保存到新文件。我添加了一些调试,以便您可以看到流程。
<强> filter.pl 强>
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use 5.010;
use List::Util qw( uniq );
my $DEBUG = 1;
sub debug {
my ($msg) = @_;
print "DEBUG: $msg\n" if $DEBUG;
}
my $ligand_file = 'ligands.txt';
open( my $LIG, $ligand_file ) or die "Cannot open $ligand_file, $!";
debug("Reading ligands file: $ligand_file");
my %ligands_hash;
for my $ligand ( <$LIG> ) {
chomp( $ligand ); # Remove trailing newline
$ligands_hash{ $ligand } = 1;
}
close $LIG;
debug("Found ligands: " . join(',',sort keys %ligands_hash));
my %output_files;
my $flag = 0;
for my $pdb ( glob '*pdb' ) {
my %ligands_found;
my $data_hash_ref;
debug("-"x40);
debug("Working on file $pdb");
open my $fh, "<", $pdb;
for my $line (<$fh>) {
chomp($line);
if ( $line =~ m/^ATM / ) {
$line =~ s|\s*$||;
debug("--> Found an ATM line");
my @cols = split ' ', $line;
my ( $ligand, $chain_id, $res_no ) = ( $cols[3], $cols[4], $cols[5] );
debug("--> Adding ligand $ligand to ligands_found hash");
$ligands_found{ $ligand }++;
defined $res_no
or die "Unable to grok line: $line";
# This works because perl automatically creates the missing
# parts of nested hash (this is known as Autovivication).
# The last part, the array is also created by the attempt
# to push onto it, so perl assumes it should exist.
push @{ $data_hash_ref->{$chain_id}->{$res_no} }, $line;
}
}
debug("Processing ligands");
for my $ligand (sort keys %ligands_found) {
$flag = defined $ligands_hash{$ligand} ? 0 : 1;
debug("--> Ligand $ligand, flag = $flag");
for my $chain_id ( keys %$data_hash_ref ) {
for my $res_no ( keys %{ $data_hash_ref->{$chain_id} } ) {
debug("------> Chain Id = $chain_id, Res No = $res_no");
my @lines = @{ $data_hash_ref->{$chain_id}->{$res_no} };
if ( $flag == 0 and scalar @lines > 1 ) {
# Output filename based on first ligand with $chain_id and $res_no combo
my $id = join ':', $chain_id, $res_no;
my $outfile = $output_files{$id} ||= join( '#', $ligand, $chain_id, $res_no ) . '.txt';
my $nl = (scalar @lines);
my $nl_desc = "$nl line" . ($nl > 1 ? "s" : "");
debug("------> Appending $nl_desc to $outfile");
open my $out, ">> $outfile";
print $out "$_\n" for (uniq @lines);
close $out;
# Remove the lines so they don't get printed twice.
undef @{ $data_hash_ref->{$chain_id}->{$res_no} };
}
}
}
}
}
<强> intput.pdb 强>
ATM 4387 FE HEM A 142
ATM 4388 CHA HEM A 142
ATM 4389 CHB HEM A 142
ATM 4431 CHA HEM B 147
ATM 4432 CHB HEM B 147
ATM 4433 CHC IGP B 147
ATM 4434 CHD IGP B 147
ATM 4559 O HOH A 156
ATM 4560 O HOH A 159
<强> HEM#A#142.txt 强>
ATM 4387 FE HEM A 142
ATM 4388 CHA HEM A 142
ATM 4389 CHB HEM A 142
<强> HEM#B#147.txt 强>
ATM 4431 CHA HEM B 147
ATM 4432 CHB HEM B 147
ATM 4433 CHC IGP B 147
ATM 4434 CHD IGP B 147