我试图在shell中解决这个特定的问题,但我还没有得到任何东西......请帮忙!
我有一个file.txt,其格式超过30K:
phoneNumber|ID|CITY|NAME|SURNAME1|SURNAME2|NAME SURNAME1 SURNAME2|
例如我有这个输入文件:
558000003|11111113B|LONDON|NAME FAKE3|SURNAME FAKE3|SURNAMEFAKE_3|NAME SURNAME1 SURNAME2|
558000002|11111112B|LONDON|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
558000001|11111111B|LONDON|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
558000003|11111113B|BERLIN|NAME FAKE3|SURNAME FAKE3|SURNAMEFAKE_3|NAME SURNAME1 SURNAME2|
557000002|11111112A|BERLIN|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
557000001|11111111A|BERLIN|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
如您所见,第1行和第4行相似,但第3列。我想得到的是这个输出:
558000003|11111113B|LONDON,BERLIN|NAME FAKE3|SURNAME FAKE3|SURNAMEFAKE_3|NAME SURNAME1 SURNAME2|
558000002|11111112B|LONDON|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
558000001|11111111B|LONDON|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
557000002|11111112A|BERLIN|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
557000001|11111111A|BERLIN|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
我不关心输出线的顺序。我试图用命令来解决这个问题" awk"在脚本shell中,但没有任何作用...
如果一个字段中有巧合,是否可以连接线?
答案 0 :(得分:2)
假设$ 1和$ 2的组合创建唯一键:
$ awk '
BEGIN { FS=OFS="|" }
{
key = $1 SUBSEP $2
keys[key]
for (i=1; i<=NF; i++) {
if ( !seen[key,i,$i]++ && ((key,i) in fld) ) {
fld[key,i] = fld[key,i] "," $i
}
else {
fld[key,i] = $i
}
}
}
END {
for (key in keys) {
for (i=1; i<=NF; i++) {
printf "%s%s", fld[key,i], (i<NF?OFS:ORS)
}
}
}
' file
558000002|11111112B|LONDON|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
558000001|11111111B|LONDON|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
558000003|11111113B|LONDON,BERLIN|NAME FAKE3|SURNAME FAKE3|SURNAMEFAKE_3|NAME SURNAME1 SURNAME2|
557000002|11111112A|BERLIN|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
557000001|11111111A|BERLIN|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
答案 1 :(得分:1)
awk way
首次出现时会打印出所有内容(可能会被改进/缩短)
awk -F'|' -vOFS="|" 'b[$2]{split(a[$2],c,"|");gsub(/.*/,c[3]",&",$3)}{a[$2]=$0;if(!b[$2])d[NR]=$2;b[$2]++}END{for(i=1;i<=NR;i++)if(d[i])print a[d[i]]}' file
分手了
awk -F'|' -vOFS="|" '
b[$2]{split(a[$2],c,"|")
gsub(/.*/,c[3]",&",$3)
}
{a[$2]=$0
if(!b[$2])d[NR]=$2
b[$2]++
}
END{for(i=1;i<=NR;i++)if(d[i])print a[d[i]]}' file
如果单字符数组名称有问题
awk -F'|' -vOFS="|" '
Count[$2]{split(Line[$2],Arr,"|")
gsub(/.*/,Arr[3]",&",$3)
}
{Line[$2]=$0
if(!Count[$2])Key[NR]=$2
Count[$2]++
}
END{for(i=1;i<=NR;i++)if(Key[i])print Line[Key[i]]}' file
558000003|11111113B|LONDON,BERLIN|NAME FAKE3|SURNAME FAKE3|SURNAMEFAKE_3|NAME SURNAME1 SURNAME2|
558000002|11111112B|LONDON|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
558000001|11111111B|LONDON|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|
557000002|11111112A|BERLIN|NAME FAKE2|SURNAME FAKE2|SURNAMEFAKE_2|NAME SURNAME1 SURNAME2|
557000001|11111111A|BERLIN|NAME FAKE1|SURNAME FAKE1|SURNAMEFAKE_1|NAME SURNAME1 SURNAME2|