查找矩阵中字符串的出现次数

时间:2014-09-03 16:14:13

标签: r perl awk

从下表中我想总结第1列中miRNA的次数 具有正值和负值(第3列)并将其绘制为条形图。

我已经发出了这个命令,但是它总结了值而不是计算出现的次数:

awk '{x[$1 " " $2]+=$3} END{for (r in x)print r,x[r]}'

示例:

miRNA           target          value

mmu-miR-423-3p  NM_198167       0.7999
mmu-miR-744-5p  NM_001166476    0.79927
mmu-miR-423-5p  NM_146188      -0.79503
mmu-miR-423-3p  NM_172262      -0.79463
mmu-miR-3968    NM_001185020    0.79367
mmu-miR-298-5p  NM_175127       0.79357
mmu-miR-423-5p  NM_009320      -0.7934
mmu-miR-423-5p  NM_015732       0.7928
....

output:

miRNA           positive           negative
mmu-miR-423-3p  1                  1
mmu-miR-423-5p  1                  2

3 个答案:

答案 0 :(得分:2)

$ awk '
{ $3<0 ? neg[$1]++ : pos[$1]++ }
END {
    fmt = "%-16s%-10s%s\n"
    printf fmt, "miRNA", "positive", "negative"
    for (rna in pos)
        if (rna in neg)
            printf fmt, rna, pos[rna], neg[rna]
}
' file
miRNA           positive  negative
mmu-miR-423-3p  1         1
mmu-miR-423-5p  1         2

答案 1 :(得分:2)

试试R:

ddf$sign = ifelse(ddf$value<0,"neg","pos")
with(ddf, table(miRNA, sign))
                sign
miRNA            neg pos
  mmu-miR-298-5p   0   1
  mmu-miR-3968     0   1
  mmu-miR-423-3p   1   1
  mmu-miR-423-5p   2   1
  mmu-miR-744-5p   0   1

答案 2 :(得分:1)

Perl解决方案:

use strict;
use warnings;

my %dataCoutner;
foreach my $line (<DATA>) {
    chomp($line);
    next if($line =~ /^miRNA/);
    my @data = split /\s+/,$line;
    if($data[2] < 0) {
        $dataCoutner{$data[0]}->{'neg'}++;
    }
    else {
        $dataCoutner{$data[0]}->{'pos'}++;
    }
}
print "miRNA\tpositive\tnegative\n";
foreach my $key (sort keys %dataCoutner) {
    print "$key\t" . ($dataCoutner{$key}->{'pos'} // 0) . "\t" . ($dataCoutner{$key}->{'neg'} // 0) . "\n";
}

__DATA__
miRNA           target          value
mmu-miR-423-3p  NM_198167       0.7999
mmu-miR-744-5p  NM_001166476    0.79927
mmu-miR-423-5p  NM_146188      -0.79503
mmu-miR-423-3p  NM_172262      -0.79463
mmu-miR-3968    NM_001185020    0.79367
mmu-miR-298-5p  NM_175127       0.79357
mmu-miR-423-5p  NM_009320      -0.7934
mmu-miR-423-5p  NM_015732       0.7928