perl extract从hash

时间:2018-05-30 07:40:49

标签: perl hash

我有一个制表符分隔的文本文件,如下所示:

data    S1  S2  S3  S4  S5  S6
data1   0   0   0   0   0   0
data2   0   5   3   5   0.1 0.9
data3   0   3   9   3   0   0.01
data4   0   0   4   4   0   0
data5   2   5   11  7   5   0.2
data6   0   0   0   8.  0   0
data7   0   1   5   2   06  0.04

嗯,文件的结构稍微复杂一点,是一个宏基因组文件,如:

D_0__Archaea; D_1__Euryarchaeota; D_2__Thermoplasmata; D_3__Thermoplasmatales; D_4__Marine Group II; D_5__cultured archaeon 0 0 0 0 0 0 0 0 0.0035 0.00293 0.00834 0

从D_0__到D_5__ =第一列(示例中的数据) 每个数字代表每列(S)

但最后,类似!!!!

我想要的是使用%row hash用单个@label_match(s3)提取第一行(数据),然后在单个txt文件中打印出来,我的意思是,如果我想要s3和s6,打印出这样的东西:

S3_file.txt(取每列的名称打印出文件的名称):

s3   data #avoid this line in the print out, just to explain !!!
0    data1
3    data2
9    data3
4    data4
11   data5
0    data6
5    data7

S6_file.txt:

0    data1
0.9  data2
0.01 data3
0    data4
0.2  data5
0    data6
0.04 data7

我有这个代码,我认为在%row部分我必须做一个foreach循环,逐个提取每个@label_match,但我不知道如何。 这是我的代码:

#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dumper qw(Dumper);
use Getopt::Long;
use List::Util qw(sum);


my ($infile_taxon, $search_label, $output_file, $help, $help_desc, $options, $options_desc, $keep_file);

GetOptions (
    't=s'       =>\$infile_taxon,
    's=s'       =>\$search_label,
    'kf'        =>\$keep_file,
    'o=s'       =>\$output_file,
    'h'         =>\$help,
    'op'        =>\$options
);

#---------------------------Subrutin to clean the selected Taxon  --------------------
sub Taxon_Clean {
    my (@clean_result);
    foreach (@_){
        chomp;
        if ($_ =~ s/D_0__//g | s/;D_\d__/\t/g | s/;/\t/g){
            push @clean_result, $_;
        }
    }

    return @clean_result;
}

#------------------------------------------------------ Open Files-------------------
open INFILE_TAXONOMY, '<', "$infile_taxon" or die $!;

my (@taxon, @sample_names);

#------------------------------------------------------ Taxon -----------------------
my ( @header, @label_match, @not_match, @taxon_filter);
while (<INFILE_TAXONOMY>){
    chomp;
    if ($_=~ m/^$|Constructed from biom file/g)  {
        next;
    }
    elsif ($_=~ s/OTU ID/Taxon/g){
        chomp ( @header = split '\t', $_ );

#------------------------------------------------------ Search Label ----------------
        if ($search_label){
            my @label_wanted= split (/\,/, $search_label); 
            unshift @label_wanted, '#Taxon';
            @label_wanted = uniq (@label_wanted);
            foreach (@label_wanted){
                my $unit =$_;
                chomp $unit;
                if (my @match_wanted= grep (/$unit/, @header)){
                    push (@label_match, @match_wanted);
                }
                else {
                    push (@not_match, $unit);
                }
            }

#                                --------- Check Point ---------

            push (my @defined_elements, @label_match);
            shift @defined_elements;

            if (! @defined_elements){
                print "\n\tNON of the Search Samples \" $search_label \" "
                  . "Were Found in \" $infile_taxon \" File !!!\n\n";
                exit;
            }

            elsif (grep {defined($_)} @defined_elements){  
                if (grep {defined($_)} @not_match){
                    print "\n\tSamples No Found: @not_match\n\n";
                }
            }
        }
    }
    elsif ($_=~ m/^#/g){
        next;
    }

    elsif ($search_label) {  
        my %row;
        @row{@header} = split '\t'; 
        my @filter= join "\t", @row{@label_match}, "\n";
        push (@taxon_filter, @filter);
        #print Dumper (\%row);
    }
    else {
        push (@taxon, $_); 
    }
}


# The Next section is to extract all the wanted columns in a single file,
# but here is where I want to extract one by one column i a separate file !!!



open OUTPUT, '>', "Taxonomic_results_file.txt", or die "can't create the output file";

foreach (@taxon_filter){
    chomp $_;
    my ($tax, @values) = split '\t', $_;
    my $unit_val = join("\t", map { $_ } @values);
    my $sum_elements = sum (@values);
    if ($sum_elements == 0){
        next;
    }
    else {
        push (my @tx, $tax);
        @tx = Taxon_Clean (@tx);
        print OUTPUT "$unit_val\t@tx\n";
    }
}


close INFILE_TAXONOMY;
close OUTPUT;
exit;

非常感谢

1 个答案:

答案 0 :(得分:1)

使用<int-http:outbound-gateway id="outbound.gateway" request-channel="get.request.channel" url="{fhirurl}" http-method-expression="payload.getHttpMethod()" expected-response-type-expression="payload.getResponseType()" charset="UTF-8" reply-timeout="5000" reply-channel="reply.channel"> <int-http:uri-variable name="fhirurl" expression="payload.getUrl()"/> </int-http:outbound-gateway> 类型语法已经很多了。这需要一个哈希切片,这意味着你可以根据哈希键匹配多个元素。

输出效果大致相同

@row{@header}

open ( my $s3_file, '>', 'S3_file.txt' ) or warn $!; my @output_fields = qw ( s3 data ); #matches column headings 区块内降低:

%row