用Perl解析瑞士文件

时间:2019-05-23 09:05:09

标签: perl

我是Perl的初学者,我想将swiss文件中的一些参数解析为文本。我找到了如何从swiss文件中解析ID的方法,但到目前为止,仅此而已。我必须从文件ID AC中获取。

我的瑞士文件如下:

ID   140U_DROME Reviewed; 261 AA.

AC   P81928; Q9VFM8;

SQ   SEQUENCE   261 AA;  29182 MW;  5DB78CF6CFC4435A CRC64;

     MNFLWKGRRF LIAGILPTFE GAADEIVDKE NKTYKAFLAS KPPEETGLER LKQMFTIDEF
     GSISSELNSV YQAGFLGFLI GAIYGGVTQS RVAYMNFMEN NQATAFKSHF DAKKKLQDQF
     TVNFAKGGFK WGWRVGLFTT SYFGIITCMS VYRGKSSIYE YLAAGSITGS LYKVSLGLRG
     MAAGGIIGGF LGGVAGVTSL LLMKASGTSM EEVRYWQYKW RLDRDENIQQ AFKKLTEDEN
     PELFKAHDEK TSEHVSLDTI K
//

我的代码:

open(IN, "<transmem_proteins.swiss") or die "Cant open the file";
open(OUT, ">text.txt") or die "Cant open the file";
while(<IN>){

    if($_=~/^ID\s{3}(\S+\s)/){

        print OUT ">$1| \n";
        print OUT "// \n";
    }
}

1 个答案:

答案 0 :(得分:0)

以下是如何从swiss文件中提取数据的示例:

use feature qw(say);
use strict;
use warnings;

{
    my $data = read_swiss_file();
    my @ids;
    for my $chunk ( @$data ) {
        my ( $item1, $item2, $item3);
        if( $chunk =~ /^ID\s{3}(\S+)\s+\S+;\s+(.*)\.\s+$/m ){
            $item1 = $1;
            $item2 = $2;
            $item2 =~ s/\s+//;
        }
        if( $chunk =~ /^AC\s{3}(\S+);/m ){
            $item3 = $1;
        }
        push @ids, [$item1, $item2, $item3] if defined $item1;
    }

    my $fn = 'text.txt';
    open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
    for my $items (@ids) {
        say $fh "->", join '|', @$items;
    }
    close $fh;
}

sub read_swiss_file {
    my $fn = 'transmem_proteins.swiss';
    open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
    my $str = do { local $/; <$fh> };
    close $fh;
    my @chunks = split /(?m:^\/\/)/, $str;
    return \@chunks;
}