所以,
我用生物序列将很多.txt文件数据输入我的@col数组......我需要在每个位置出现几个字母(A,C,G,T)的频率......一切都是很好...它的工作原理,但我想像这样转换输出......
输出是:
A 1.112 1.124 1.258
C 1.154 1.122 1.587
G 1.158 1.454 1.478
T 1.154 1.125 1.478
但是我想转置那个...我的意思是将行分成列......就像
一样A C G T
1.112 1.154 1.154 1.154
等等
代码:
@col = {GTGTCCATTAGAGGGCGCCA GCAGCCTCCTGAGGACGCCA GAGACCTCAAGGGGCCACTA GGGGCCACTAGGGGGCTCGA ATGGCCACAAGAGGGCGTCA CTGCCCGCCCGGCGGCGCCG GCGGGCAGCAGGGGGAGCCG ATCACCACCAGGTGGCGCCG AAGGACACTAGGTGGAGCCA TCGGCCGGCAGAGGGCGCTG ATGACCGCCAGGGGTCGCTC ACCACCAGCAGGGGGCACCT GCAGCCCGTGGGGGGCGCCG GTGGGCGGCAGGGGGCGCTG CCAGCCTCTAGGGGCCACTG TTGACCACCAGATGGTGGTA CCTGCCGAAAGGGGGCAGTG and so on }
foreach my $row(@col)
{
++$pwm{ substr $row, $_, 1 }[ $_ ] for 0 .. length( $row ) -1; #holt die Teilstrings aus der Zeile, sprich Pos 1, Pos2....
}
@col=(); # benoetige leeres array fuer oben
@$_ = map{ $_ ? ($_/$row_counter)+1 : 1 } @$_ for values %pwm;
print "$_ @{ $pwm{$_}}\n" for sort keys %pwm;
答案 0 :(得分:3)
这似乎可以满足您的需求,但我很惊讶您希望频率从1到2而不是从0到1
use strict;
use warnings 'all';
my @col = qw/
GTGTCCATTAGAGGGCGCCA
GCAGCCTCCTGAGGACGCCA
GAGACCTCAAGGGGCCACTA
GGGGCCACTAGGGGGCTCGA
ATGGCCACAAGAGGGCGTCA
CTGCCCGCCCGGCGGCGCCG
GCGGGCAGCAGGGGGAGCCG
ATCACCACCAGGTGGCGCCG
AAGGACACTAGGTGGAGCCA
TCGGCCGGCAGAGGGCGCTG
ATGACCGCCAGGGGTCGCTC
ACCACCAGCAGGGGGCACCT
GCAGCCCGTGGGGGGCGCCG
GTGGGCGGCAGGGGGCGCTG
CCAGCCTCTAGGGGCCACTG
TTGACCACCAGATGGTGGTA
CCTGCCGAAAGGGGGCAGTG
/;
my %pwm;
for ( @col ) {
my @row = split //; #/
for my $i ( 0 .. $#row ) {
my $k = $row[$i];
++$pwm{$k}[$i];
}
}
for my $counts ( values %pwm ) {
for my $count ( @$counts ) {
$count = ( $count // 0) / @col + 1;
}
}
my @keys = sort keys %pwm;
my $fmt = '%-5s ' x @keys . "\n";
printf $fmt, @keys;
$fmt = '%.3f ' x @keys . "\n";
for my $i ( 0 .. @col ) {
printf $fmt, map { $pwm{$_}[$i] } @keys;
}
A C G T
1.294 1.176 1.412 1.118
1.118 1.412 1.059 1.412
1.176 1.118 1.647 1.059
1.294 1.059 1.588 1.059
1.059 1.824 1.118 1.000
1.000 2.000 1.000 1.000
1.471 1.059 1.294 1.176
1.059 1.588 1.294 1.059
1.176 1.529 1.000 1.294
1.824 1.059 1.059 1.059
1.000 1.000 2.000 1.000
1.294 1.000 1.706 1.000
1.000 1.059 1.765 1.176
1.000 1.000 2.000 1.000
1.059 1.118 1.765 1.059
1.118 1.824 1.000 1.059
1.235 1.000 1.706 1.059
1.000 1.824 1.118 1.059