如何用数字替换复杂的ID?

时间:2013-11-21 19:54:34

标签: awk

我有一个文件,每个ID号都有多个条目。该文件大约有2,000个ID,每个ID有54,000个观察值。我需要将输出提供给一个算法,该算法要求ID小于6个字符。如何用1到2000的数字替换ID?文件中的ID如下所示:

2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569

需要它看起来像这样(想要保留ID):

1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569

由于

2 个答案:

答案 0 :(得分:4)

$ cat file
2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569
$ 
$ awk '!seen[$0]++{++id} {print id, $0}' file
1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569

答案 1 :(得分:2)

尝试关注awk

awk '!($0 in id) {id[$0]=++n} {print id[$0], $0}' file

简短说明

awk '
    !($0 in id) {             # if line is not present in array 'id'
         id[$0]=++n           # assign unique ID of a line to incremental number i.e. create an array of id with line a key 
    } 
    {
        print id[$0], $0      # print corresponding ID along with line content
    }' file                   # input file