例如我有一个文件:
key1 1212,32332
key2 1212,3232,3232
我想获取文件:
if ((context->count[0] += ((UINT4)inputLen << 3)) < ((UINT4)inputLen << 3))
context->count[1]++;
context->count[1] += ((UINT4)inputLen >> 29);
答案 0 :(得分:1)
在awk中:
$ awk '{a[$1]=a[$1](a[$1]==""?"":",")$2}END{for(i in a)print i,a[i]}' file
key1 1212,32332
key2 1212,3232,3232
说明:
awk '{ # use awk for this kind of stuff
a[$1]=a[$1] ( a[$1]=="" ? "" : "," ) $2 # hash on first col and append seconds
}
END { # after everything is hashed
for(i in a) # for each entry in hash a
print i,a[i] # output key and data
}' file # oh yeah the file
编辑:我们可以使用a
对文件进行排序,然后输出密钥和所有数据,而不是让awk进行缓冲(即散列到sort
)之后以逗号分隔。再次使用awk作为后一部分:
$ sort file | awk '$1!=p{printf "%s%s",(NR>1?ORS:""),$1}{printf "%s%s", ($1==p?",":OFS),$2;p=$1}END{print ""}'
key1 1212,32332
key2 1212,3232,3232
这里sort
没有给出任何花哨的参数,但在现实世界中可能需要一些参数。 awk部分解释说:
sort file | \ # sort the file
awk ' # before feeding to awk
$1!=p { # if key is different from previous key
printf "%s%s",(NR>1?ORS:""),$1 # newline and print the key
}
{
printf "%s%s", ($1==p?",":OFS),$2 # print the data comma-separated
p=$1 # store key for comparing on the next round
}
END{
print "" # finish the last line nicely
}'
答案 1 :(得分:0)
awk '{a[$1]=(a[$1]!="")?a[$1]","$2:$2}END{for(i in a){print i "\t" a[i]}}' file
key1 1212,32332
key2 1212,3232,3232
应该这样做。
答案 2 :(得分:0)
如果您想避免缓冲整个文件的结果(例如,如果文件非常大),您可以使用sort
和Python的itertools.groupby
。像这样创建一个Python脚本:
# group.py
import itertools, sys
for k, g in itertools.groupby(sys.stdin, lambda x: x.split()[0]):
print(k, ",".join([x.split()[1] for x in g]))
然后运行:
sort file | python group.py
key1 1212,32332
key2 1212,3232,3232
否则,这个快速的Perl单行程也可以通过在哈希中累积值来实现:
perl -aE 'push @{$h{$F[0]}}, $F[1]; END {$"= ","; say "$_ @{$h{$_}}" for sort keys %h}' file
输出:
key1 1212,32332
key2 1212,3232,3232
答案 3 :(得分:-1)
它不是纯粹的sh / coreutils,但考虑使用datamash来执行此任务:
sed -r -e 's/[[:space:]]+/ /g' < infile.txt | datamash -t ' ' -s groupby 1 collapse 2