我有一个文件,每个ID号都有多个条目。该文件大约有2,000个ID,每个ID有54,000个观察值。我需要将输出提供给一个算法,该算法要求ID小于6个字符。如何用1到2000的数字替换ID?文件中的ID如下所示:
2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569
需要它看起来像这样(想要保留ID):
1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569
由于
答案 0 :(得分:4)
$ cat file
2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569
$
$ awk '!seen[$0]++{++id} {print id, $0}' file
1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569
答案 1 :(得分:2)
尝试关注awk
awk '!($0 in id) {id[$0]=++n} {print id[$0], $0}' file
简短说明
awk '
!($0 in id) { # if line is not present in array 'id'
id[$0]=++n # assign unique ID of a line to incremental number i.e. create an array of id with line a key
}
{
print id[$0], $0 # print corresponding ID along with line content
}' file # input file