如果第2列为空,我需要跳过具有相同第一列的所有行,然后对于其他行,我需要计算第3列中第4列的百分比吗?
输入:
T75PA 2 0
T75PA kk 4 1
T240P 4 3
T240P test 3 3
T240P test2 3 1
T245P rr 8 1
T245P rr 33 1
T226PA fg 4 2
T226PA g 51 38
T226PA e 41 34
输出
T245P rr 8 1 0.125
T245P rr 33 1 0.03030303
T226PA fg 4 2 0.5
T226PA g 51 38 0.745098039
T226PA e 41 34 0.829268293
答案 0 :(得分:1)
尝试:
awk '$2 ~ /[0-9]+/{for(i in res){if ($1 ~ res[i])delete res[i]};\
rm[$1]=$1;next}\
{if($1 in rm)next;ratio=$4/$3;res[NR]=$0"\t"ratio}\
END{for (i in res)print res[i]}' file
这将忽略少于四个条目的所有行, 对于所有其他条目,计算和连接定量 与entrie并保存在数组res中。经过处理后 文件,res的条目打印到stdout。
输出:
T245P rr 8 1 0.125
T245P rr 33 1 0.030303
T226PA fg 4 2 0.5
T226PA g 51 38 0.745098
T226PA e 41 34 0.829268
HTH Chris
答案 1 :(得分:1)
我假设您的数据是制表符分隔的。像这样的perl脚本(我还没有测试过它)......
my @data;
my %counts;
my %blanks;
while( my $line = <STDIN> )
{
chop($line);
my @rec = split( "\t", $line );
push( @data, \@rec );
$counts{$rec[0]}++;
if( $rec[1] eq '' )
{
$blanks{$rec[0]}++;
}
}
foreach my $rec ( @data )
{
if( $counts{$rec->[0]} <= 1 || !$blanks{$rec->[0]} )
{
print join( "\t", @$rec, $rec->[3] / $rec->[2] ) . "\n";
}
}
答案 2 :(得分:1)
怎么样:
#!/usr/bin/perl
use Modern::Perl;
my $re = qr/^([A-Z0-9]+)\s+?(\S+|\s+)\s+(\d+)\s+(\d+)\s*$/;
my $skip = '';
while (<DATA>) {
chomp;
if (my @l = $_ =~ /$re/) {
if ($l[1] =~ /^\s+$/ || $skip eq $l[0]) {
$skip = $l[0];
next;
}
$skip = '';
my $r = $l[3] / $l[2];
say "$_\t$r";
}
}
__DATA__
T75PA 2 0
T75PA kk 4 1
T240P 4 3
T240P test 3 3
T240P test2 3 1
T245P rr 8 1
T245P rr 33 1
T226PA fg 4 2
T226PA g 51 38
T226PA e 41 34
<强>输出:强>
T245P rr 8 1 0.125
T245P rr 33 1 0.0303030303030303
T226PA fg 4 2 0.5
T226PA g 51 38 0.745098039215686
T226PA e 41 34 0.829268292682927
答案 3 :(得分:1)
awk '
NR==FNR {if (NF < 4) blank[$1]; next}
$1 in blank {next}
{$(NF+1) = $4/$3; print}
' datafile datafile | column -t
因为你现在说字段分隔符是tab:
awk '
BEGIN {OFS = FS = "\t"}
NR==FNR {if ($2 == "") blank[$1]; next}
$1 in blank {next}
{$5 = $4/$3; print}
' datafile datafile