我有一个源文件,其中包含2M +文本行,如下所示:
388708091|347|||||0010.60|N01/2012|
388708101|348|||||0011.60|N01/2012|
388708101|349|||||0012.60|N01/2012|
388719001|348|||||0010.38|M05/2013|
388719001|349|||||0011.38|M05/2013|
我想映射并替换第二列(其值为347,348,349等),其地图如下所示:
346 309
347 311
348 312
349 313
350 314
351 315
352 316
请注意,虽然地图是2-D,但仍有100多行。
使用目标地图替换源文件第二列中的数据的最有效命令行方法是什么?
答案 0 :(得分:2)
awk
似乎是工作的工具:
awk 'NR == FNR { a[$1] = $2; next } FNR == 1 { FS = "|"; OFS = FS; $0 = $0 } { $2 = a[$2] } 1' mapfile datafile
代码的工作原理如下:
NR == FNR { # while processing the first file (mapfile)
a[$1] = $2 # remember the second field by the first
next # do nothing else
}
FNR == 1 { # at the first line of the second file (datafile):
FS = "|" # start splitting by | instead of whitespace
OFS = FS # delimit output the same way as the input
$0 = $0 # force resplitting of this first line
}
{ # for all lines in the second file:
$2 = a[$2] # replace the 2nd field with the remembered value for that key
}
1 # print the line
警告:这假定数据文件第二列中的每个值在映射文件中都有对应的条目;那些不会被空字符串替换的人。如果不希望出现这种情况,请替换
{ $2 = a[$2] }
与
{ if($2 in a) { $2 = a[$2] } else { $2 = "something else" } }
对我来说,在这种情况下应该发生什么事情并不明显。