删除第一列中的重复记录,但不要修改其余列

时间:2018-05-19 17:31:53

标签: bash awk

我想删除第1列中的重复记录,保留第一个实例。但保持其余的列不受影响。

输入

444444              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
444444              116,118,124-125,120,122-123,126,132.                       
444444              25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
444444              110,118,124-125,120,122-123,126,132.                       
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
111111              116,118,124-125,120,122.                                   
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
232323              20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
232323              116,118,124-125,120,122-123,126,132.                       

输出

444444              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    116,118,124-125,120,122-123,126,132.                       
                    25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    110,118,124-125,120,122-123,126,132.                       
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    116,118,124-125,120,122.                                   
                    21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
232323              20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117, 
                    116,118,124-125,120,122-123,126,132.                      

我试过

 awk '!NF {print;next}; !($1 in a) {a[$1];print}' file

另外,尝试将文件拆分为两部分:

file 1: first column and remove the duplicates and keep first > output1
file 2: Second Column 
paste output1 file2 > file-output.

是否可以选择简单的awk行。

4 个答案:

答案 0 :(得分:2)

awk可能适合您:

awk 'seen[$1]++{$1="\t\t"} 1' file

444444   21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
         116,118,124-125,120,122-123,126,132.
111111   21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
         116,118,124-125,120,122.
232323   21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
         116,118,124-125,120,122-123,126,132.

答案 1 :(得分:1)

如果您的Input_file按照您显示的第一列排序,那么以下内容可能对您有帮助。

awk 'prev==$1{$1="                   "} 1; {prev=$1}'   Input_file

解决方案第二: 如果您的Input_file未排序,则以下内容可能对您有帮助。

 awk '++a[$1]>1{$1="                   "} 1'   Input_file

答案 2 :(得分:1)

保持行的格式

你可以尝试

awk '$1!=prev{prev=new=$1;gsub("."," ",new);print;next}{sub($1,new)}1' input

如果$ 1包含regexp metachars

awk '
  $1!=prev {
    prev=new=$1
    gsub("."," ",new)
    print
    next }
  { i=split($1,a,//)
    b=""
    for(j=1;j<=i;j++)
    b=b "[" a[j] "]"
    sub(b,new) }
1' input

答案 3 :(得分:1)

任何修改$ 1 的内容都会修改记录。 真正的方法是做你要求的:

$ awk 'seen[$1]++{rep=$1; gsub(/./," ",rep); sub(/[^[:space:]]+/,rep)} 1' file
444444              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    116,118,124-125,120,122-123,126,132.
                    25-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    110,118,124-125,120,122-123,126,132.
111111              21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    116,118,124-125,120,122.
                    21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
232323              20-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
                    116,118,124-125,120,122-123,126,132.

以上删除重复的$ 1值,并保留其他所有内容,包括字段内和字段之间的空格,完全按原样。