awk具有重复值

时间:2014-01-30 21:53:19

标签: bash scripting awk

文件:

22 Hello
22 Hi
1  What
34 Where
21 is
44 How
44 are
44 you

期望输出:

22 HelloHi
1  What
34 Where
21 is
44 Howareyou

如果第一个字段($ 1)中有重复值,则第二个字段应附加文字

如何使用awk实现这一目标?

由于

5 个答案:

答案 0 :(得分:10)

$ awk '
!seen[$1]++ { keys[++numKeys] = $1 } 
{ str[$1] = str[$1] $2 }
END{
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        print key, str[key]
    }
}
' file
22 HelloHi
1 What
34 Where
21 is
44 Howareyou

答案 1 :(得分:6)

使用awk:

awk '!($1 in a){a[$1]=$2;next} $1 in a{a[$1]=a[$1] $2} END{for (i in a) print i, a[i]}' file
22 HelloHi
44 Howareyou
34 Where
21 is
1 What

编辑:保留订单:

awk '!($1 in a){b[++n]=$1; a[$1]=$2;next} $1 in a{a[$1] = a[$1] $2}
        END{for (i=1; i<=n; i++) print b[i], a[b[i]]}' file
22 HelloHi
1 What
34 Where
21 is
44 Howareyou

答案 2 :(得分:5)

要维持订单,您需要跟踪它:

awk '
    ! seen[$1]++ {order[++n] = $1}
    {value[$1] = value[$1] $2}
    END {for (i=1; i<=n; i++) print order[i], value[order[i]]}
' <<END
22 Hello
22 Hi
1  What
34 Where
21 is
44 How
44 are
44 you
END
22 HelloHi
1 What
34 Where
21 is
44 Howareyou

如果您知道第1列中的值是连续的(如示例文本中所示),则:

awk '
    prev != $1 {printf "%s%s ", sep, $1; sep=RS} 
    {printf "%s", $2; prev = $1} 
    END {print ""}
'

其他几种方法:

perl -lane '
        push @keys, $F[0] unless grep {$_ eq $F[0]} @keys;
        $val{$F[0]} .= $F[1]
    } END {
        print "$_ $val{$_}" for @keys
' file

并且,进入利基区域

#!/usr/bin/env tclsh
while {[gets stdin line] != -1} {dict append val {*}$line}
dict for {k v} $val {puts "$k $v"}

答案 3 :(得分:1)

这是Python中的替代解决方案,正如@shellter所要求的那样:

from collections import defaultdict

with open("file") as infile:
    d = defaultdict(str)
    #Build dictionary of values
    for line in infile:
        line = line.strip()
        k, _, v = line.partition(" ")
        d[k] += v
    #Print everything
    for k, v in d.iteritems():
        print k,v

请注意,此解决方案中不保留顺序。这是一个替代解决方案,它提供完全所需的输出:

from collections import defaultdict

with open("file") as infile:
    d = defaultdict(str)
    orig_order = []
    #Build dictionary of values
    for line in infile:
        line = line.strip()
        k, _, v = line.partition(" ")
        d[k] += v
        #Add to original order if not seen yet
        if not k in orig_order:
            orig_order.append(k)
    #Print everything
    for k in orig_order:
        print k, d[k]

请注意,这些是快速制作的解决方案,我相信可以毫不费力地使它们更短或更灵活。

答案 4 :(得分:0)

如果订单不重要,这将有效:

awk '{a[$1]=a[$1]$2}; END {for (i in a) {print a[i]}}' file

..如果订单 重要:

awk '{if (!a[$1]) b[++i]=$1;a[$1]=a[$1]$2}; END {for (j=1;j<i;j++) {print a[b[j]]}}' file