如何使用foreach控制结构将perl中的结果规范化?

时间:2012-11-06 17:36:42

标签: perl normalization

我有这个输出:

10dvex2_miRNA_ce.out.data|6361
10dvex2_miRNA_ce.out.data|6361
10dvex2_misc_RNA_ce.out.data|0
10dvex2_rRNA_ce.out.data|239

在Perl中使用此脚本:

#!/usr/bin/perl

use warnings;
use strict;

open(MYINPUTFILE, $ARGV[0]); # open for input
my @lines = <MYINPUTFILE>; # read file into list
my $count = 0;
print "Frag"."\t"."ncRNA"."\t"."Amount"."\n";

foreach my $lines (@lines){
my $pattern = $lines;
$pattern =~ s/(.*)dvex\d_(.*)_(.*).(out.data)\|(.*)/$1 $2   $3  $5/g;
$count += $5;
print $1."\t".$2.$3."\t".$5."\n";
}
close(MYINPUTFILE);
exit;

我提取这些信息:

Frag    ncRNA   Amount
10  miRNAce 6361
10  misc_RNAce  0
10  rRNAce  239

但是在Amount列中我想报告这些数字除以总结果(6600)。在这种情况下,我想要这个输出:

Frag    ncRNA   Amount
10  miRNAce 0.964
10  misc_RNAce  0
10  rRNAce  0.036

我的问题是在循环中提取TOTAL结果...来规范化这些数据。一些想法?

2 个答案:

答案 0 :(得分:1)

也许以下内容会有所帮助:

use strict;
use warnings;

my ( %hash, $total, %seen, @array );

while (<>) {
    next if $seen{$_}++;
    /(\d+).+?_([^.]+).+\|(\d+)$/;
    $hash{$1}{$2} = $3;
    $total += $3;
}

print "Frag\tncRNA\tAmount\n";

while ( my ( $key1, $val1 ) = each %hash ) {
    while ( my ( $key2, $val2 ) = each %$val1 ) {
        my $frac = $val2 / $total == 0 ? 0 : sprintf( '%.3f', $val2 / $total );
        push @array, "$key1\t$key2\t$frac\n";
    }
}

print map { $_->[0] }
  sort    { $b->[1] <=> $a->[1] }
  map { [ $_, (split)[2] ] }
  @array;

数据集的输出:

Frag    ncRNA   Amount
10  miRNA_ce    0.964
10  rRNA_ce 0.036
10  misc_RNA_ce 0

跳过相同的行,然后从每一行捕获所需的元素。保留运行总计用于后续计算。您所需的输出显示从高到低排序,这就是为什么每条记录都push编辑到@array。但是,如果没有必要进行排序,您只需打印该行并省略@array上的Schwartzian transform

希望这有帮助!

答案 1 :(得分:1)

为此,您需要两次传递数据。

#! /usr/bin/env perl

use warnings;
use strict;

print join("\t",qw'Frag ncRNA Amount'),"\n";

my @data;
my $total = 0;

# parse the lines
while( <> ){
  my @elem = /(.+?)(?>dvex)\d_(.+)_([^._]+)[.]out[.]data[|](d+)/;
  next unless @elem;

  # running total
  $total += $elem[-1];

  # combine $2 and $3
  splice @elem, 1, 2, $2.$3; # $elem[1].$elem[2];

  push @data, \@elem;
}

# print them
for( @data ){
  my @copy = @$_;
  $copy[-1] = $copy[-1] / $total;
  $copy[-1] = sprintf('%.3f', $copy[-1]) if $copy[-1];
  print join("\t",@copy),"\n";
}