Question

我在下面有一个代码，它根据fileB中的数据替换fileA中的第4列，但输出没有保留原始文件的空格。反正有吗？

 tr , " " <fileB | awk 'NR==FNR{a[$2]=$1;next} {$4=a[$4];print}' - fileA

的fileA

 xxx    xxx   xxx Z0002

FILEB

 3100,3000
 W0002,Z0002

使用上面的代码输出：

 xxx xxx xxx W0002

预期产出：

xxx    xxx   xxx W0002

Answer 1

这应该做：

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next} {n=split($0,d,/[^[:space:]]*/);if(b[$4])$4=b[$4];for(i=1;i<=n;i++) printf("%s%s",d[i],$i);print ""}' fileB fileA

它将空格存储在一个数组中，以便以后可以重复使用

示例：

cat fileA
xxx    xxx   xxx Z0002   not change this
xxx   xxx  Z0002 zzz
xxx Z000223213 xxx Z0002 xxx xxx xxx Z0002

cat fileB
3100,3000
W0002,Z0002

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next} {n=split($0,d,/[^[:space:]]*/);if(b[$4])$4=b[$4];for(i=1;i<=n;i++) printf("%s%s",d[i],$i);print ""}' fileB fileA
xxx    xxx   xxx  W0002   not change this
xxx   xxx  Z0002 zzz
xxx Z000223213 xxx  W0002 xxx xxx xxx Z0002

更具可读性及其工作原理：

awk '
FNR==NR {                           # For the first file "fileB"
    split($0,a,",")                 # Split it to an array "a" using "," as separator 
    b[a[2]]=a[1]                    # Store the data in array "b" using second column as index
    next                            # Skip to next record
    }
    {                               # Then for the file "fileA"
    n=split($0,d,/[^[:space:]]*/)   # Split the spaces inn group and store them in array "d"
    if(b[$4])                       # If array "b" as data for field "4"
        $4=b[$4]                    # Change filed "4" to data found in array "b"
    for(i=1;i<=n;i++)               # Loop trough all field in the line
        printf("%s%s",d[i],$i)      # print correct separator and data
    print ""                        # Add new line at the end
    }
' fileB fileA                       # Read the files.

Answer 2

使用gsub（正则表达式替换），前面有空格模式，行尾$之后会解决问题。

测试文件：

$ cat fileA
xxx    xxx   xxx Z0002
xxx    xxx   Z0002 xxx
xxx    xxx   xxx Z0002YY

命令执行和结果：

$ tr , " " <fileB | awk 'NR==FNR{a[$2]=$1;next}  a[$4]=="" {print} a[$4]!=""{gsub(" "$4"$", " "a[$4], $0);print}' - fileA
xxx    xxx   xxx W0002
xxx    xxx   Z0002 xxx
xxx    xxx   xxx Z0002YY

Answer 3

长awk回答

对于这个问题，这有点矫枉过正，但我认为这对其他人有用。

它将避免元字符的问题以及线上其他地方出现的模式。

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next}
     {
         while(match(substr($0,x+=(RSTART+RLENGTH-(x>1?1:0))),"[^[:space:]]+")){
             E[++D]=(RSTART+x-(x>1?1:0))
             F[D]=E[D]+RLENGTH
         }
     }

     b[$4]~/./{$0=substr($0,0,E[4]-1) b[$4] substr($0,F[4])}
     {x=1;D=0;delete E}1' FILEB FILEA

实施例

<强>输入

FILEA

xxx    Z0002   xxx Z0002 xxx    xxx   xxx Z0002
xxx    Z0002   xxx dsasa xxx    xxx   xxx Z0002

FILEB

3100,3000
W0002,Z0002

<强>输出

xxx    Z0002   xxx W0002 xxx    xxx   xxx Z0002
xxx    Z0002   xxx dsasa xxx    xxx   xxx Z0002

解释

稍后会添加

替换文件中的列但保留空间格式

3 个答案:

实施例

解释