Question

例如我有一个文件：

key1   1212,32332
key2   1212,3232,3232

我想获取文件：

if ((context->count[0] += ((UINT4)inputLen << 3)) < ((UINT4)inputLen << 3))
    context->count[1]++;
context->count[1] += ((UINT4)inputLen >> 29);

Answer 1

在awk中：

$ awk '{a[$1]=a[$1](a[$1]==""?"":",")$2}END{for(i in a)print i,a[i]}' file
key1 1212,32332
key2 1212,3232,3232

说明：

awk '{                                        # use awk for this kind of stuff
    a[$1]=a[$1] ( a[$1]=="" ? "" : "," ) $2   # hash on first col and append seconds
}
END {                                         # after everything is hashed
    for(i in a)                               # for each entry in hash a
        print i,a[i]                          # output key and data
}' file                                       # oh yeah the file

编辑：我们可以使用a对文件进行排序，然后输出密钥和所有数据，而不是让awk进行缓冲（即散列到sort）之后以逗号分隔。再次使用awk作为后一部分：

$ sort file | awk '$1!=p{printf "%s%s",(NR>1?ORS:""),$1}{printf "%s%s", ($1==p?",":OFS),$2;p=$1}END{print ""}'
key1 1212,32332
key2 1212,3232,3232

这里sort没有给出任何花哨的参数，但在现实世界中可能需要一些参数。 awk部分解释说：

sort file | \                          # sort the file
awk '                                  # before feeding to awk
$1!=p {                                # if key is different from previous key
    printf "%s%s",(NR>1?ORS:""),$1     # newline and print the key
}
{
    printf "%s%s", ($1==p?",":OFS),$2  # print the data comma-separated 
    p=$1                               # store key for comparing on the next round
}
END{ 
    print ""                           # finish the last line nicely
}'

Answer 2

awk '{a[$1]=(a[$1]!="")?a[$1]","$2:$2}END{for(i in a){print i "\t" a[i]}}' file
key1    1212,32332
key2    1212,3232,3232

应该这样做。

Answer 3

如果您想避免缓冲整个文件的结果（例如，如果文件非常大），您可以使用sort和Python的itertools.groupby。像这样创建一个Python脚本：

# group.py

import itertools, sys

for k, g in itertools.groupby(sys.stdin, lambda x: x.split()[0]):
    print(k, ",".join([x.split()[1] for x in g]))

然后运行：

sort file | python group.py 
key1 1212,32332
key2 1212,3232,3232

否则，这个快速的Perl单行程也可以通过在哈希中累积值来实现：

perl -aE 'push @{$h{$F[0]}}, $F[1]; END {$"= ","; say "$_ @{$h{$_}}" for sort keys %h}' file

输出：

key1 1212,32332
key2 1212,3232,3232

Answer 4

它不是纯粹的sh / coreutils，但考虑使用datamash来执行此任务：

sed -r -e 's/[[:space:]]+/ /g' < infile.txt | datamash -t ' ' -s groupby 1 collapse 2

如何通过键将多个字符串折叠为一个？

4 个答案: