我想删除第1列中的重复记录,保留第一个实例。但保持其余的列不受影响。
输入
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
444444 116,118,124-125,120,122-123,126,132.
444444 25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
444444 110,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 116,118,124-125,120,122-123,126,132.
输出
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
110,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122.
21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
我试过
awk '!NF {print;next}; !($1 in a) {a[$1];print}' file
另外,尝试将文件拆分为两部分:
file 1: first column and remove the duplicates and keep first > output1
file 2: Second Column
paste output1 file2 > file-output.
是否可以选择简单的awk行。
答案 0 :(得分:2)
此awk
可能适合您:
awk 'seen[$1]++{$1="\t\t"} 1' file
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122.
232323 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
答案 1 :(得分:1)
如果您的Input_file按照您显示的第一列排序,那么以下内容可能对您有帮助。
awk 'prev==$1{$1=" "} 1; {prev=$1}' Input_file
解决方案第二: 如果您的Input_file未排序,则以下内容可能对您有帮助。
awk '++a[$1]>1{$1=" "} 1' Input_file
答案 2 :(得分:1)
保持行的格式
你可以尝试
awk '$1!=prev{prev=new=$1;gsub("."," ",new);print;next}{sub($1,new)}1' input
如果$ 1包含regexp metachars
awk '
$1!=prev {
prev=new=$1
gsub("."," ",new)
print
next }
{ i=split($1,a,//)
b=""
for(j=1;j<=i;j++)
b=b "[" a[j] "]"
sub(b,new) }
1' input
答案 3 :(得分:1)
任何修改$ 1 的内容都会修改记录。 真正的方法是做你要求的:
$ awk 'seen[$1]++{rep=$1; gsub(/./," ",rep); sub(/[^[:space:]]+/,rep)} 1' file
444444 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
110,118,124-125,120,122-123,126,132.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122.
21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323 20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
116,118,124-125,120,122-123,126,132.
以上仅删除重复的$ 1值,并保留其他所有内容,包括字段内和字段之间的空格,完全按原样。