从下表中我想总结第1列中miRNA的次数 具有正值和负值(第3列)并将其绘制为条形图。
我已经发出了这个命令,但是它总结了值而不是计算出现的次数:
awk '{x[$1 " " $2]+=$3} END{for (r in x)print r,x[r]}'
示例:
miRNA target value
mmu-miR-423-3p NM_198167 0.7999
mmu-miR-744-5p NM_001166476 0.79927
mmu-miR-423-5p NM_146188 -0.79503
mmu-miR-423-3p NM_172262 -0.79463
mmu-miR-3968 NM_001185020 0.79367
mmu-miR-298-5p NM_175127 0.79357
mmu-miR-423-5p NM_009320 -0.7934
mmu-miR-423-5p NM_015732 0.7928
....
output:
miRNA positive negative
mmu-miR-423-3p 1 1
mmu-miR-423-5p 1 2
答案 0 :(得分:2)
$ awk '
{ $3<0 ? neg[$1]++ : pos[$1]++ }
END {
fmt = "%-16s%-10s%s\n"
printf fmt, "miRNA", "positive", "negative"
for (rna in pos)
if (rna in neg)
printf fmt, rna, pos[rna], neg[rna]
}
' file
miRNA positive negative
mmu-miR-423-3p 1 1
mmu-miR-423-5p 1 2
答案 1 :(得分:2)
试试R:
ddf$sign = ifelse(ddf$value<0,"neg","pos")
with(ddf, table(miRNA, sign))
sign
miRNA neg pos
mmu-miR-298-5p 0 1
mmu-miR-3968 0 1
mmu-miR-423-3p 1 1
mmu-miR-423-5p 2 1
mmu-miR-744-5p 0 1
答案 2 :(得分:1)
Perl解决方案:
use strict;
use warnings;
my %dataCoutner;
foreach my $line (<DATA>) {
chomp($line);
next if($line =~ /^miRNA/);
my @data = split /\s+/,$line;
if($data[2] < 0) {
$dataCoutner{$data[0]}->{'neg'}++;
}
else {
$dataCoutner{$data[0]}->{'pos'}++;
}
}
print "miRNA\tpositive\tnegative\n";
foreach my $key (sort keys %dataCoutner) {
print "$key\t" . ($dataCoutner{$key}->{'pos'} // 0) . "\t" . ($dataCoutner{$key}->{'neg'} // 0) . "\n";
}
__DATA__
miRNA target value
mmu-miR-423-3p NM_198167 0.7999
mmu-miR-744-5p NM_001166476 0.79927
mmu-miR-423-5p NM_146188 -0.79503
mmu-miR-423-3p NM_172262 -0.79463
mmu-miR-3968 NM_001185020 0.79367
mmu-miR-298-5p NM_175127 0.79357
mmu-miR-423-5p NM_009320 -0.7934
mmu-miR-423-5p NM_015732 0.7928