使用perl的标准化方法

时间:2013-01-30 19:43:11

标签: linux perl

我正在尝试使用自动脚本进行标准化(从原始值减去平均值并除以stdev)。 我有一个10000行的文件。我需要最初为每列计算平均值和stdev然后使用这些值我必须遵守新的标准化值。我可以很容易地在excel中这样做。但我正在寻找一个自动脚本。

输入

DOTR1   10.29006    10.06744    10.47105    10.05041    10.18407    9.770205    10.90548    10.75112
RCC2    6.699481    7.240353    7.263434    6.654058    6.86063 7.151931    6.796337    6.78525
HHPA6   7.31182 7.547056    8.338827    7.278408    7.545548    7.409964    7.149899    7.300342
PAX8    8.336847    8.651292    8.493323    8.5056  8.445139    8.651406    8.664237    8.56571
ACA1A   4.233111    4.320666    4.232803    4.390224    4.269969    4.314899    4.264211    4.142419
UBA7    8.196608    8.164725    7.361889    8.055019    8.882745    7.6884  7.835754    8.354209
OOA 5.098222    5.212986    5.301191    5.211401    5.13133 5.153725    5.269111    5.195991
ACX1    4.875679    5.01305 4.921618    4.930978    4.899562    4.92918 4.970339    4.986362    

第1列的平均值为6.880,stdev为2.066

我现在将从我的观察中减去平均值,并将stdev除以(10.29006-6.880)/2.066。我将在第1列中的所有后续观察中逐行进行此操作。对于第2列,我将找到其平均值和相应的stdev,并按照相同的步骤进行。

谢谢,

我尝试了以下代码来获取avg和stdev ..我坚持继续下一步..

sub average{
    my($data) = @_;
    if (not @$data) {
            die("Empty array\n");
    }
    my $total = 0;
    foreach (@$data) {
            $total += $_;
    }
    my $average = $total / @$data;
    return $average;
}
 sub stdev{
    my($data) = @_;
    if(@$data == 1){
            return 0;
    }
    my $average = &average($data);
    my $sqtotal = 0;
    foreach(@$data) {
            $sqtotal += ($average-$_) ** 2;
    }
    my $std = ($sqtotal / (@$data-1)) ** 0.5;
    return $std;
}

2 个答案:

答案 0 :(得分:0)

只需使用数组数组来表示表。逐列遍历表,获取mean和stdev,然后替换列中的每个值。

#!/usr/bin/perl
use warnings;
use strict;

open my $IN, '<', 'input' or die $!;

my @table;

while (<$IN>) {
    $table[$. - 1] = [ split ];
}

for my $column (1 .. $#{ $table[0] }) {

    my $total = 0;
    $total   += $_ for map $table[$_][$column], 0 .. $#table;
    my $mean  = $total / @table;

    my $sqtot = 0;
    $sqtot   += ($mean - $_) ** 2 for map $table[$_][$column], 0 .. $#table;
    my $stdev = ($sqtot / $#table) ** 0.5;

    $table[$_][$column] = ($table[$_][$column] - $mean) / $stdev for 0 .. $#table;
}

$\ = "\n";
for my $line (@table) {
    print join "\t", @$line;
}

答案 1 :(得分:0)

我以为我会发布一个我想出的解决方案,虽然它不像choroba那么简单。它使用Statistics::Descriptive

更新:嗯,这不是一个非常好的解决方案 - 只需要一个解决方案就可以创建3个阵列。忽视这个解决方案。

#!/usr/bin/perl
use strict;
use warnings;
use Statistics::Descriptive;

my @data = map [split], <DATA>;

my @transpose = transpose(@data);
my @stats;

for my $row (@transpose[1.. $#transpose]) {
    my $stat = Statistics::Descriptive::Full->new;
    $stat->add_data($row);
    push @stats, [$stat->mean, $stat->standard_deviation];
}

my @new;

for my $r (0 .. $#data) {
    my @tmp;
    for my $c (1 .. $#{$data[$r]}) {
        push @tmp, ($data[$r][$c] - $stats[$c-1][0]) / $stats[$c-1][1];
    }
    push @new, [$data[$r][0], map {sprintf "%.3f", $_} @tmp];
}

# output loop
for my $row (@new) {
    print join("\t", @$row), "\n";  
}

sub transpose {
    my @array = @_;

    my @trans;
    for my $i (0 .. $#array) {
        for my $j (0 .. $#{$array[$i]}) {
            $trans[$j][$i] = $array[$i][$j];    
        }   
    }
    return @trans;
}

__DATA__
DOTR1   10.29006    10.06744    10.47105    10.05041    10.18407    9.770205    10.90548    10.75112
RCC2    6.699481    7.240353    7.263434    6.654058    6.86063 7.151931    6.796337    6.78525
HHPA6   7.31182 7.547056    8.338827    7.278408    7.545548    7.409964    7.149899    7.300342
PAX8    8.336847    8.651292    8.493323    8.5056  8.445139    8.651406    8.664237    8.56571
ACA1A   4.233111    4.320666    4.232803    4.390224    4.269969    4.314899    4.264211    4.142419
UBA7    8.196608    8.164725    7.361889    8.055019    8.882745    7.6884  7.835754    8.354209
OOA 5.098222    5.212986    5.301191    5.211401    5.13133 5.153725    5.269111    5.195991
ACX1    4.875679    5.01305 4.921618    4.930978    4.899562    4.92918 4.970339    4.986362

打印出来:

C:\Old_Data\perlp>perl t33.pl
DOTR1   1.650   1.516   1.624   1.610   1.490   1.502   1.797   1.698
RCC2    -0.087  0.106   0.102   -0.117  -0.079  0.140   -0.085  -0.102
HHPA6   0.209   0.259   0.612   0.200   0.245   0.274   0.077   0.132
PAX8    0.705   0.810   0.686   0.824   0.669   0.920   0.770   0.706
ACA1A   -1.281  -1.349  -1.335  -1.268  -1.301  -1.336  -1.244  -1.302
UBA7    0.637   0.567   0.149   0.595   0.875   0.419   0.391   0.610
OOA     -0.862  -0.904  -0.829  -0.851  -0.895  -0.900  -0.784  -0.824
ACX1    -0.970  -1.004  -1.009  -0.993  -1.004  -1.017  -0.921  -0.919