Question

文件1：

77, 4, -3, A0080
235, 5, -1, K0511

file2的：

A0132, 77, -1, -2,  19.776
00000, 77, 4, -3,  18.608,
A0794, 235, -2, -2,  22.81
A0796, 235, -2, -5,  12.27
00000, 235, 5, -1,  18.992

期望的输出：

A0132, 77, -1, -2,  19.776
A0080, 77, 4, -3,  18.608,
A0794, 235, -2, -2,  22.81
A0796, 235, -2, -5,  12.27
K0511, 235, 5, -1,  18.992

基本上将file1的column1，column2，column3匹配到file2的column2，column3，column4，如果匹配则将file1的column1替换为file1的column4的值。

我用过：

awk 'FNR==NR {a[$1,$2,$3]++;next} a[$2,$3,$4]  {print $0}' file1 file2

获取输出

00000, 77, 4, -3,  18.608,
00000, 235, 5, -1,  18.992

然后我被卡住了。请帮忙。顺便说一下，这是针对2个文件，一般情况下如何处理2个以上的文件。

Answer 1

显然尾随空格存在一些问题。这会让事情变得复杂，因为你需要做一些技巧$field+=0来克服它（它会删除尾随空格）。

你可以试试这个：

awk -F"," -v OFS=","
    'FNR==NR {$1+=0; $2+=0; $3+=0; a[$1,$2,$3]=$4;next}
     {$2+=0; $3+=0; $4+=0
      if (($2,$3,$4) in a) {$1=a[$2,$3,$4]}
      print
     }' f1 f2

基本上，它将值存储在索引（第1，第2，第3）列的第4列中。然后，当读取第二个文件时，它检查给定的索引是否与那里的第2，第3和第4列匹配;如果是这样，它将取代第一个字段。

对于您的给定输入，它返回：

$ awk -F"," -v OFS="," 'FNR==NR {$1+=0; $2+=0; $3+=0; a[$1,$2,$3]=$4;next} {$2+=0; $3+=0; $4+=0; if (($2,$3,$4) in a) {$1=a[$2,$3,$4]} print}' f1 f2
A0132,77,-1,-2,  19.776
A0080,77,4,-3,  18.608,
A0794,235,-2,-2,  22.81
A0796,235,-2,-5,  12.27
K0511,235,5,-1,  18.992

Answer 2

cat file1 file2 \
 | sed -n 'H;${x
:cycle
# \n
#:11
# 77, 4, -3, A0080
#^2222222222 44444
#^33
# A0132, 77, -1, -2,  19.776
#^55555555555555555555555555
# 00000, 77, 4, -3,  18.608,
#^       2222222222
   s/\(\n\)\(\([^,]*,\)\{3\}\) \([A-Z0-9]*\)\(.*\)00000, \2/\1\2 \4\5\4, \2/
   t cycle
:clean
   s/\(\n\)\([^,]*,\)\{3\} [A-Z0-9]*\1/\1/g
   t clean
   s/^\n//
   p
   }'

posix sed（GNU sed上的--posix）。带有#^的行statring在上面的行中给出了分组索引，因此2222222222是模式中稍后使用的\2的内容

在工作缓冲区中加载所有行
在后面的[($[^,]*,$\{3\}\)]行s///中找到:cycle行中{\2的每个三元组[00000,，g作为前缀，替换为跟随三联['4`]
如果找到/替换，则重试（1更改包含下一个三元组，以便t cycle始终只进行1次更改）通过\1表示如果发生s ///，请转到标签周期，如果没有继续到下一行
清理三元组（替换任何行，以新行开头的模式，在这种情况下只有新行[h]具有三元组的指纹]，并删除使用a添加的第一个新行第一行的{{1}}将当前行附加到缓冲区（因此第一行只是一个新行）
打印结果

Answer 3

awk 'FILENAME==ARGV[1]{max++;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4;next} {done=0;for (i=0;i<$max;i++) {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) {$1=""; print a4[i]","$0; done=1; break}}; if (done==0){ print}}' file1 file2

或者更容易阅读：

awk 'FILENAME==ARGV[1]{ ## process file 1  
   max++;               ## keep track of how many entries in file 1 
   a1[FNR]=$1;          ## build separate arrays for each field we care about
   a2[FNR]=$2;
   a3[FNR]=$3;
   a4[FNR]=$4;
   next}                ## go to next file 
  {done=0;              ## set a flag so we know when we have no match 
   for (i=0;i<$max;i++) ## loop over all array entries in file 1  
   {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) ## if columns match in our pairing   
     {$1="";            ## get rid of column 1   
      print a4[i]","$0; ## print out file 1 column 4 & column 2 onward for file 2 
      done=1;           ## set the flag so we know we had a match 
      break}};          ## break for loop, no need to waste time processing more
     if (done==0) {     ## if we did not match, print out the existing file 2 line 
        print}}'    
file1 file2

如果你想扩展更多的文件，你可以添加更多的子句来设置文件名的ARGV（当然，将逻辑更改为你想要的） - 如果你想让它自动化和灵活，你可以建立这个使用shell循环并使用eval执行它：

 awk 'FILENAME==ARGV[1]{a[FNR]=$0;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4;next}
      FILENAME==ARGV[2]{b[FNR]=$0;b2[FNR]=$2;b3[FNR]=$3;b4[FNR]=$4;next}
      FILENAME==ARGV[3]{print "hi" a1[FNR] b2[FNR]}' file1 file2 file3

更新以处理评论中列出的数据结果：

awk 'FILENAME==ARGV[1]{max++;a[FNR]=$0;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4;$1="";$2="";$3="";$4="";a[FNR]=$0;gsub(",+$","",a[FNR]);next} {done=0;for (i=0;i<$max;i++) {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) {$1=""; gsub(",+$","",$0);gsub(" +","",a[i]);print " "a4[i]$0","a[i]; done=1; break}}; if (done==0){ print}}' file1 file2

所做的更改是添加文件1中的结束字段并清理一些化妆品：

awk 'FILENAME==ARGV[1]
## save $0 in new array
{max++;a[FNR]=$0;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4; 
## skip the first fields of new array up to field 4 and rid the trailing comma  
$1="";$2="";$3="";$4="";a[FNR]=$0;gsub(",+$","",a[FNR]); next} 
{done=0;for (i=0;i<$max;i++) {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) 
{$1=""; gsub(",+$","",$0);gsub(" +","",a[i]); ## rid unnecessary whitespace 
## print the rest of file 1 line entry 
print " "a4[i]$0","a[i]; done=1; break}}; if (done==0){ print}}' file1 file2

Answer 4

这可能适合你（GNU sed）：

sed -r 's|^(.*,)\s*(.*)|s/^(.*,) \1/\2, \1/|' file1 | sed -rf - file2

从file1创建一个sed脚本以针对file2运行。

匹配来自两个文件的多列中的元素，然后更新或合并它们

4 个答案: