我有这个输出:
10dvex2_miRNA_ce.out.data|6361
10dvex2_miRNA_ce.out.data|6361
10dvex2_misc_RNA_ce.out.data|0
10dvex2_rRNA_ce.out.data|239
在Perl中使用此脚本:
#!/usr/bin/perl
use warnings;
use strict;
open(MYINPUTFILE, $ARGV[0]); # open for input
my @lines = <MYINPUTFILE>; # read file into list
my $count = 0;
print "Frag"."\t"."ncRNA"."\t"."Amount"."\n";
foreach my $lines (@lines){
my $pattern = $lines;
$pattern =~ s/(.*)dvex\d_(.*)_(.*).(out.data)\|(.*)/$1 $2 $3 $5/g;
$count += $5;
print $1."\t".$2.$3."\t".$5."\n";
}
close(MYINPUTFILE);
exit;
我提取这些信息:
Frag ncRNA Amount
10 miRNAce 6361
10 misc_RNAce 0
10 rRNAce 239
但是在Amount列中我想报告这些数字除以总结果(6600)。在这种情况下,我想要这个输出:
Frag ncRNA Amount
10 miRNAce 0.964
10 misc_RNAce 0
10 rRNAce 0.036
我的问题是在循环中提取TOTAL结果...来规范化这些数据。一些想法?
答案 0 :(得分:1)
也许以下内容会有所帮助:
use strict;
use warnings;
my ( %hash, $total, %seen, @array );
while (<>) {
next if $seen{$_}++;
/(\d+).+?_([^.]+).+\|(\d+)$/;
$hash{$1}{$2} = $3;
$total += $3;
}
print "Frag\tncRNA\tAmount\n";
while ( my ( $key1, $val1 ) = each %hash ) {
while ( my ( $key2, $val2 ) = each %$val1 ) {
my $frac = $val2 / $total == 0 ? 0 : sprintf( '%.3f', $val2 / $total );
push @array, "$key1\t$key2\t$frac\n";
}
}
print map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, (split)[2] ] }
@array;
数据集的输出:
Frag ncRNA Amount
10 miRNA_ce 0.964
10 rRNA_ce 0.036
10 misc_RNA_ce 0
跳过相同的行,然后从每一行捕获所需的元素。保留运行总计用于后续计算。您所需的输出显示从高到低排序,这就是为什么每条记录都push
编辑到@array
。但是,如果没有必要进行排序,您只需打印该行并省略@array
上的Schwartzian transform。
希望这有帮助!
答案 1 :(得分:1)
为此,您需要两次传递数据。
#! /usr/bin/env perl
use warnings;
use strict;
print join("\t",qw'Frag ncRNA Amount'),"\n";
my @data;
my $total = 0;
# parse the lines
while( <> ){
my @elem = /(.+?)(?>dvex)\d_(.+)_([^._]+)[.]out[.]data[|](d+)/;
next unless @elem;
# running total
$total += $elem[-1];
# combine $2 and $3
splice @elem, 1, 2, $2.$3; # $elem[1].$elem[2];
push @data, \@elem;
}
# print them
for( @data ){
my @copy = @$_;
$copy[-1] = $copy[-1] / $total;
$copy[-1] = sprintf('%.3f', $copy[-1]) if $copy[-1];
print join("\t",@copy),"\n";
}