同时打印和使用两个文件中的数据

时间:2017-04-15 03:39:36

标签: perl file file-handling

pdb1.pdb

catch

pdb2.pdb

ATOM    709  CA  THR    25     -29.789  33.001  72.164  1.00  0.00
ATOM    711  CB  THR    25     -29.013  31.703  72.370  1.00  0.00
ATOM    734  CG  THR    25     -29.838  30.458  72.573  1.00  0.00
ATOM    768  CE  THR    25     -28.541  28.330  71.361  1.00  0.00

输出所需

ATOM    765  N   ALA    25     -30.838  33.150  73.195  1.00  0.00
ATOM    764  N   LEU    26     -29.457  33.193  69.767  1.00  0.00
ATOM    783  N   VAL    27     -30.286  31.938  66.438  1.00  0.00
ATOM    798  N   GLY    28     -28.076  30.044  64.519  1.00  0.00

等等。

709 CA 765 N 1.477 -29.789 33.001 72.164 -30.838 33.150 73.195 709 CA 764 N 2.427 -29.789 33.001 72.164 -29.457 33.193 69.767 709 CA 783 N 5.844 -29.789 33.001 72.164 -30.286 31.938 66.438 pdb1.pdb的内容是读取第2,3,6,7和8列中的值,然后使用第6,7,8列进行距离计算。

我试过这个,但输出没有打印出来。

的Perl

pdb2.pdb

2 个答案:

答案 0 :(得分:1)

将这两个文件读入内存可能最简单,除非它们非常庞大

此解决方案调用子例程read_file来构建每个文件中所有五个感兴趣字段的哈希数组。然后计算增量并重新格式化输出数据

use strict;
use warnings 'all';

my $f1 = read_file('file1.txt');
my $f2 = read_file('file2.txt');

for my $r1 ( @$f1 ) {

    for my $r2 ( @$f2 ) {

        my ($dx, $dy, $dz) = map { $r1->{$_} - $r2->{$_} } qw/ x y z /;
        my $delta = sqrt( $dx * $dx + $dy * $dy + $dz * $dz );

        my @rec = (
            @{$r1}{qw/ id name /},
            @{$r2}{qw/ id name /},
            sprintf('%5.3f', $delta),
            @{$r1}{qw/ x y z /},
            @{$r2}{qw/ x y z /},
        );

        print "@rec\n";
    }
}

sub read_file {
    my ($file_name) = @_;

    open my $fh, '<', $file_name or die qq{Unable to open "$file_name" for input: $!};

    my @records;

    while ( <$fh> ) {
        next unless /\S/;
        my %record;
        @record{qw/ id name x y z /} = (split)[1,2,5,6,7];
        push @records, \%record;
    }

    \@records;
}

输出

709 CA 765 N 1.478 -29.789 33.001 72.164 -30.838 33.150 73.195
709 CA 764 N 2.427 -29.789 33.001 72.164 -29.457 33.193 69.767
709 CA 783 N 5.845 -29.789 33.001 72.164 -30.286 31.938 66.438
709 CA 798 N 8.374 -29.789 33.001 72.164 -28.076 30.044 64.519
711 CB 765 N 2.471 -29.013 31.703 72.370 -30.838 33.150 73.195
711 CB 764 N 3.032 -29.013 31.703 72.370 -29.457 33.193 69.767
711 CB 783 N 6.072 -29.013 31.703 72.370 -30.286 31.938 66.438
711 CB 798 N 8.079 -29.013 31.703 72.370 -28.076 30.044 64.519
734 CG 765 N 2.938 -29.838 30.458 72.573 -30.838 33.150 73.195
734 CG 764 N 3.937 -29.838 30.458 72.573 -29.457 33.193 69.767
734 CG 783 N 6.327 -29.838 30.458 72.573 -30.286 31.938 66.438
734 CG 798 N 8.255 -29.838 30.458 72.573 -28.076 30.044 64.519
768 CE 765 N 5.646 -28.541 28.330 71.361 -30.838 33.150 73.195
768 CE 764 N 5.199 -28.541 28.330 71.361 -29.457 33.193 69.767
768 CE 783 N 6.348 -28.541 28.330 71.361 -30.286 31.938 66.438
768 CE 798 N 7.069 -28.541 28.330 71.361 -28.076 30.044 64.519

答案 1 :(得分:0)

您的代码有很多语法错误。我对您的代码进行了一些更改,这将使您开始使用您想要的内容。

首先,use strictuse warnings通过这种方式你已经消除了很多噪音。

use strict;
use warnings;

open(my $f1, "pdb1.pdb") or die $!;    
open(my $f2, "pdb2.pdb") or die $!;

while(defined(my $line1 = <$f1>) and defined(my $line2 = <$f2>))
{
   # print "Iam here";
   my  @splitted = split(' ',$line1);

    my @fields = split / /, $line1;

    #print $fields[1], "\n";

    my $atom1 = @{[$line1 =~ m/\S+/g]}[2];
    my $no1   = @{[$line1 =~ m/\w+/g]}[3];

    my $x1 = @{[$line1 =~ m/\w+/g]}[6];
    my $y1 = @{[$line1 =~ m/\w+/g]}[7];
    my $z1 = @{[$line1 =~ m/\w+/g]}[8];

    my $atom2 = @{[$line2 =~ m/\w+/g]}[2];
    my $no2   = @{[$line2 =~ m/\w+/g]}[3];

    my $x2 = @{[$line2 =~ m/\w+/g]}[6];
    my $y2 = @{[$line2 =~ m/\w+/g]}[7];
    my $z2 = @{[$line2 =~ m/\w+/g]}[8];

    #print $atom1;

    for ($f1, $f2) { 
        print "$atom1 $no1 $x1 $y1 $z1 $atom2 $no2 $x2 $y2 $z2 \n"; 
    }
}

close ($f1);
close ($f2);

现在回答您的问题,您的预期输出似乎与您在逻辑中所做的不同。您同时循环两个文件,这将执行一次迭代,而不是file1中的每一行,file2中的所有行。所以我认为你可能需要看一下循环部分。

接下来你需要知道的是关于列拆分。

@splitted = split(' ',$line1);

如果以上述方式拆分一行,则会得到数组中的所有列。现在你的column1是第零索引,第一个索引是第2列,依此类推。

所以你应该做第一栏

my $col1 = @splitted[0];

如果您正在使用这些正则表达式来获取列,那么就不需要它了,因为您已经拆分了这些列,并且每个列都独立于数组中。

<强>更新

您遇到的问题是您使用文件句柄进行迭代导致问题。

use strict;
use warnings;

open(my $f1, "<pdb1.pdb") or die "$!" ;
open(my $f2, "<pdb2.pdb") or die "$!" ; 
my @in1 = <$f1>;
my @in2 = <$f2>;

foreach my $file1 (@in1) {       #use array to iterate
    chomp($file1);
    #print "File1 $file1\n";
    my $atomno1=(split " ", $file1)[1];
    my $atomname1=(split " ", $file1)[2];
    my $xx=(split " ", $file1)[5];
    my $yy=(split " ", $file1)[6];
    foreach my  $file2(@in2) {

        chomp($file2);
        #print "File2 $file2\n";


        my $atomno2=(split " ", $file2)[1]; 
        my $atomname2=(split " ", $file2)[2];
        my $x=(split " ", $file2)[5];
        my $y=(split " ", $file2)[6];
        my $dis=sqrt((($x-$xx)*($x-$xx))+ (($y-$yy)*($y-$yy)));
        print "$atomno1 $atomname1 $atomno2 $atomname2 $dis $xx $yy $x $y\n" ; 
    }
    #$file1++;
} 
close ($f1);