在具有列条件的uniq之后打印行

时间:2014-02-27 09:04:45

标签: linux bash sorting awk uniq

我的文件中包含以下内容的文件

192.168.168.23 pg.something
181.135.56.13 pg.nothing
15.123.96.12 l.everything
15.151.15.3 f.something
15.151.15.3 pg.something
64.196.12.34 pg.nothing
15.123.96.12 l.everything
181.168.56.13 pg.nothing
192.168.168.23 pg.something
192.168.168.23 l.everything
192.12.56.152 l.everything
181.135.56.13 pg.nothing
64.196.12.34 pg.nothing
64.196.12.34 pg.something
181.135.56.13 pg.nothing
64.196.12.34 l.everything

我试图找出每个IP按照IP排序的每个用户的命中率。

我试过了。

for i in `cat test_file |awk '{print $1}'|sort |uniq -c |sort -rn |awk '{print $2}'`; do grep $i test_file;done |uniq -c |awk '{print $2,$3,$1}'

得到了

64.196.12.34 pg.nothing 2
64.196.12.34 pg.something 1
64.196.12.34 l.everything 1
192.168.168.23 pg.something 2
192.168.168.23 l.everything 1
181.135.56.13 pg.nothing 3
15.151.15.3 f.something 1
15.151.15.3 pg.something 1
15.123.96.12 l.everything 2
192.12.56.152 l.everything 1
181.168.56.13 pg.nothing 1

此输出没问题。但我想知道是否有办法修改此输出看起来像这样......

64.196.12.34 pg.nothing 2
             pg.something 1
             l.everything 1
192.168.168.23 pg.something 2
               l.everything 1
181.135.56.13 pg.nothing 3
15.151.15.3 f.something 1
            pg.something 1
15.123.96.12 l.everything 2
192.12.56.152 l.everything 1
181.168.56.13 pg.nothing 1

即只删除重复的IP ...

提前致谢。

3 个答案:

答案 0 :(得分:2)

您可以使用以下命令修改上次awk命令:

awk '{if ($2!=a) {print $2"\t"$3"\t"$1} else {print "\t\t"$3"\t"$1}}{a=$2}'

给出了:

64.196.12.34    pg.nothing      2
                pg.something    1
                l.everything    1
192.168.168.23  pg.something    2
                l.everything    1
181.135.56.13   pg.nothing      3
15.151.15.3     f.something     1
                pg.something    1
15.123.96.12    l.everything    2
192.12.56.152   l.everything    1
181.168.56.13   pg.nothing      1

答案 1 :(得分:0)

这是从头开始计算的:

awk '
     {a[$1,$2]++; b[$1]; c[$2]}
     END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}
    ' file | awk '
                  $1==prev {print FS $2 FS $3; next} {prev=$1; print}
                 '

第一部分是计数:

$ awk '{a[$1,$2]++; b[$1]; c[$2]} END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}' a 
192.168.168.23 pg.something 2
192.168.168.23 l.everything 1
192.12.56.152 l.everything 1
64.196.12.34 pg.nothing 2
64.196.12.34 pg.something 1
64.196.12.34 l.everything 1
15.151.15.3 f.something 1
15.151.15.3 pg.something 1
15.123.96.12 l.everything 2
181.135.56.13 pg.nothing 3
181.168.56.13 pg.nothing 1

解释

  • {a[$1,$2]++; b[$1]; c[$2]}跟踪所有行组合:a存储第一个+第二个字段,b存储第一个字段,c存储第二个字段。
  • END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}会循环显示第1和第2个字段,只打印那些匹配的字段。

从中它进行分组:

$ awk '{a[$1,$2]++; b[$1]; c[$2]} END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}' a | awk '$1==prev {print FS $2 FS $3; next} {prev=$1; print}'
192.168.168.23 pg.something 2
 l.everything 1
192.12.56.152 l.everything 1
64.196.12.34 pg.nothing 2
 pg.something 1
 l.everything 1
15.151.15.3 f.something 1
 pg.something 1
15.123.96.12 l.everything 2
181.135.56.13 pg.nothing 3
181.168.56.13 pg.nothing 1

解释

  • '$1==prev {print FS $2 FS $3; next}如果前一行有相同的第一个字段,请从第二个字段打印。
  • {prev=$1; print}'否则,正常打印。

答案 2 :(得分:0)

以下是Perl版本解决方案:

#!/usr/bin/perl

use warnings;
use strict;

my %data;

while (<DATA>) {
    chomp;
    my ($ip, $dom) = split;
    $data{$ip}->{$dom}++;
}

while(my ($ip, $doms) = each %data) {
    print "$ip\t";
    my ($dom, $cnt) = each %$doms;
    print "$dom $cnt\n";
    while (($dom, $cnt) = each %$doms) {
        print "\t\t$dom $cnt\n";
    }
    print "\n";
}

__DATA__
192.168.168.23 pg.something
181.135.56.13 pg.nothing
15.123.96.12 l.everything
15.151.15.3 f.something
15.151.15.3 pg.something
64.196.12.34 pg.nothing
15.123.96.12 l.everything
181.168.56.13 pg.nothing
192.168.168.23 pg.something
192.168.168.23 l.everything
192.12.56.152 l.everything
181.135.56.13 pg.nothing
64.196.12.34 pg.nothing
64.196.12.34 pg.something
181.135.56.13 pg.nothing
64.196.12.34 l.everything

结果:

192.12.56.152   l.everything 1

15.151.15.3     pg.something 1
                f.something 1

64.196.12.34    pg.nothing 2
                pg.something 1
                l.everything 1

181.168.56.13   pg.nothing 1

15.123.96.12    l.everything 2

192.168.168.23  pg.something 2
                l.everything 1

181.135.56.13   pg.nothing 3

结果不能很好地对齐,但是应该很容易调整它以提供与问题完全相同的对齐方式。

以下是改编版本:

while(my ($ip, $doms) = each %data) {
    print "$ip ";
    my ($dom, $cnt) = each %$doms;
    print "$dom $cnt\n";
    my $prefix = ' ' x (length $ip);
    while (($dom, $cnt) = each %$doms) {
        print "$prefix $dom $cnt\n";
    }
}