我试图将File1的Column1和2的字符串与File4的Column4和5进行比较。除了这个匹配,File2的column6也需要匹配某些字符串,如SO或CO(因为FILE1的column3和4分别是SO和CO),然后将FILE2的column7替换为FILE1的column3,否则保持其他不变。< / p>
我尝试修改并使用论坛中提供的解决方案来解决类似的问题,但是没有用。
FILE1
type code SO CO other
7757 1 6941.958 138.922 149.17
7757 2 8666.123 198.908 225.67
7757 4 2795.885 334.875 378.68
7759 GT3 222.104 13.5 734.62
7768 CT2 0 0 0
7805 6 3796.677 75.175 79.09
FILE2
"US","01073",,"7757","1","SO","10","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","10","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","10","299"
Required output:
"US","01073",,"7757","1","SO","6941.958","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","138.922","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","75.175","299"
我试过的解决方案(仅适用于CO):
tr -d '"' < FILE2 > temp # to remove double quote
awk 'NR==FNR{A[$1,$2]=$3;next} A[$4,$5] && $6=="CO" {$7=A[$1,$2]; print}' FS=" " OFS="," FILE1 temp > out
答案 0 :(得分:2)
复杂的 awk 解决方案:
awk 'function unquote(f){
return substr(f, 2, length(f)-2)
}
NR==FNR{
if (NR==1){ f3=$3; f4=$4 }
else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
next;
}
{ k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2
function unquote(f) { ... }
- 在双引号之间取消引用/提取值(事实上 - 在字符串的第1个和最后一个字符之间)
a[$1,$2,f3]=$3; a[$1,$2,f4]=$4
- 对关键序列进行分组
输出:
"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"