我正在尝试使用自动脚本进行标准化(从原始值减去平均值并除以stdev)。 我有一个10000行的文件。我需要最初为每列计算平均值和stdev然后使用这些值我必须遵守新的标准化值。我可以很容易地在excel中这样做。但我正在寻找一个自动脚本。
输入
DOTR1 10.29006 10.06744 10.47105 10.05041 10.18407 9.770205 10.90548 10.75112
RCC2 6.699481 7.240353 7.263434 6.654058 6.86063 7.151931 6.796337 6.78525
HHPA6 7.31182 7.547056 8.338827 7.278408 7.545548 7.409964 7.149899 7.300342
PAX8 8.336847 8.651292 8.493323 8.5056 8.445139 8.651406 8.664237 8.56571
ACA1A 4.233111 4.320666 4.232803 4.390224 4.269969 4.314899 4.264211 4.142419
UBA7 8.196608 8.164725 7.361889 8.055019 8.882745 7.6884 7.835754 8.354209
OOA 5.098222 5.212986 5.301191 5.211401 5.13133 5.153725 5.269111 5.195991
ACX1 4.875679 5.01305 4.921618 4.930978 4.899562 4.92918 4.970339 4.986362
第1列的平均值为6.880,stdev为2.066
我现在将从我的观察中减去平均值,并将stdev除以(10.29006-6.880)/2.066。我将在第1列中的所有后续观察中逐行进行此操作。对于第2列,我将找到其平均值和相应的stdev,并按照相同的步骤进行。
谢谢,
我尝试了以下代码来获取avg和stdev ..我坚持继续下一步..
sub average{
my($data) = @_;
if (not @$data) {
die("Empty array\n");
}
my $total = 0;
foreach (@$data) {
$total += $_;
}
my $average = $total / @$data;
return $average;
}
sub stdev{
my($data) = @_;
if(@$data == 1){
return 0;
}
my $average = &average($data);
my $sqtotal = 0;
foreach(@$data) {
$sqtotal += ($average-$_) ** 2;
}
my $std = ($sqtotal / (@$data-1)) ** 0.5;
return $std;
}
答案 0 :(得分:0)
只需使用数组数组来表示表。逐列遍历表,获取mean和stdev,然后替换列中的每个值。
#!/usr/bin/perl
use warnings;
use strict;
open my $IN, '<', 'input' or die $!;
my @table;
while (<$IN>) {
$table[$. - 1] = [ split ];
}
for my $column (1 .. $#{ $table[0] }) {
my $total = 0;
$total += $_ for map $table[$_][$column], 0 .. $#table;
my $mean = $total / @table;
my $sqtot = 0;
$sqtot += ($mean - $_) ** 2 for map $table[$_][$column], 0 .. $#table;
my $stdev = ($sqtot / $#table) ** 0.5;
$table[$_][$column] = ($table[$_][$column] - $mean) / $stdev for 0 .. $#table;
}
$\ = "\n";
for my $line (@table) {
print join "\t", @$line;
}
答案 1 :(得分:0)
我以为我会发布一个我想出的解决方案,虽然它不像choroba那么简单。它使用Statistics::Descriptive。
更新:嗯,这不是一个非常好的解决方案 - 只需要一个解决方案就可以创建3个阵列。忽视这个解决方案。
#!/usr/bin/perl
use strict;
use warnings;
use Statistics::Descriptive;
my @data = map [split], <DATA>;
my @transpose = transpose(@data);
my @stats;
for my $row (@transpose[1.. $#transpose]) {
my $stat = Statistics::Descriptive::Full->new;
$stat->add_data($row);
push @stats, [$stat->mean, $stat->standard_deviation];
}
my @new;
for my $r (0 .. $#data) {
my @tmp;
for my $c (1 .. $#{$data[$r]}) {
push @tmp, ($data[$r][$c] - $stats[$c-1][0]) / $stats[$c-1][1];
}
push @new, [$data[$r][0], map {sprintf "%.3f", $_} @tmp];
}
# output loop
for my $row (@new) {
print join("\t", @$row), "\n";
}
sub transpose {
my @array = @_;
my @trans;
for my $i (0 .. $#array) {
for my $j (0 .. $#{$array[$i]}) {
$trans[$j][$i] = $array[$i][$j];
}
}
return @trans;
}
__DATA__
DOTR1 10.29006 10.06744 10.47105 10.05041 10.18407 9.770205 10.90548 10.75112
RCC2 6.699481 7.240353 7.263434 6.654058 6.86063 7.151931 6.796337 6.78525
HHPA6 7.31182 7.547056 8.338827 7.278408 7.545548 7.409964 7.149899 7.300342
PAX8 8.336847 8.651292 8.493323 8.5056 8.445139 8.651406 8.664237 8.56571
ACA1A 4.233111 4.320666 4.232803 4.390224 4.269969 4.314899 4.264211 4.142419
UBA7 8.196608 8.164725 7.361889 8.055019 8.882745 7.6884 7.835754 8.354209
OOA 5.098222 5.212986 5.301191 5.211401 5.13133 5.153725 5.269111 5.195991
ACX1 4.875679 5.01305 4.921618 4.930978 4.899562 4.92918 4.970339 4.986362
打印出来:
C:\Old_Data\perlp>perl t33.pl
DOTR1 1.650 1.516 1.624 1.610 1.490 1.502 1.797 1.698
RCC2 -0.087 0.106 0.102 -0.117 -0.079 0.140 -0.085 -0.102
HHPA6 0.209 0.259 0.612 0.200 0.245 0.274 0.077 0.132
PAX8 0.705 0.810 0.686 0.824 0.669 0.920 0.770 0.706
ACA1A -1.281 -1.349 -1.335 -1.268 -1.301 -1.336 -1.244 -1.302
UBA7 0.637 0.567 0.149 0.595 0.875 0.419 0.391 0.610
OOA -0.862 -0.904 -0.829 -0.851 -0.895 -0.900 -0.784 -0.824
ACX1 -0.970 -1.004 -1.009 -0.993 -1.004 -1.017 -0.921 -0.919