替换文件中的列但保留空间格式

时间:2015-04-28 09:33:02

标签: bash unix awk

我在下面有一个代码,它根据fileB中的数据替换fileA中的第4列,但输出没有保留原始文件的空格。反正有吗?

 tr , " " <fileB | awk 'NR==FNR{a[$2]=$1;next} {$4=a[$4];print}' - fileA

的fileA

 xxx    xxx   xxx Z0002

FILEB

 3100,3000
 W0002,Z0002

使用上面的代码输出:

 xxx xxx xxx W0002

预期产出:

xxx    xxx   xxx W0002

3 个答案:

答案 0 :(得分:1)

这应该做:

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next} {n=split($0,d,/[^[:space:]]*/);if(b[$4])$4=b[$4];for(i=1;i<=n;i++) printf("%s%s",d[i],$i);print ""}' fileB fileA

它将空格存储在一个数组中,以便以后可以重复使用

示例:

cat fileA
xxx    xxx   xxx Z0002   not change this
xxx   xxx  Z0002 zzz
xxx Z000223213 xxx Z0002 xxx xxx xxx Z0002

cat fileB
3100,3000
W0002,Z0002

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next} {n=split($0,d,/[^[:space:]]*/);if(b[$4])$4=b[$4];for(i=1;i<=n;i++) printf("%s%s",d[i],$i);print ""}' fileB fileA
xxx    xxx   xxx  W0002   not change this
xxx   xxx  Z0002 zzz
xxx Z000223213 xxx  W0002 xxx xxx xxx Z0002

更具可读性及其工作原理:

awk '
FNR==NR {                           # For the first file "fileB"
    split($0,a,",")                 # Split it to an array "a" using "," as separator 
    b[a[2]]=a[1]                    # Store the data in array "b" using second column as index
    next                            # Skip to next record
    }
    {                               # Then for the file "fileA"
    n=split($0,d,/[^[:space:]]*/)   # Split the spaces inn group and store them in array "d"
    if(b[$4])                       # If array "b" as data for field "4"
        $4=b[$4]                    # Change filed "4" to data found in array "b"
    for(i=1;i<=n;i++)               # Loop trough all field in the line
        printf("%s%s",d[i],$i)      # print correct separator and data
    print ""                        # Add new line at the end
    }
' fileB fileA                       # Read the files.

答案 1 :(得分:0)

使用gsub(正则表达式替换),前面有空格模式,行尾$之后会解决问题。

测试文件:

$ cat fileA
xxx    xxx   xxx Z0002
xxx    xxx   Z0002 xxx
xxx    xxx   xxx Z0002YY

命令执行和结果:

$ tr , " " <fileB | awk 'NR==FNR{a[$2]=$1;next}  a[$4]=="" {print} a[$4]!=""{gsub(" "$4"$", " "a[$4], $0);print}' - fileA
xxx    xxx   xxx W0002
xxx    xxx   Z0002 xxx
xxx    xxx   xxx Z0002YY

答案 2 :(得分:0)

长awk回答

对于这个问题,这有点矫枉过正,但我​​认为这对其他人有用。

它将避免元字符的问题以及线上其他地方出现的模式。

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next}
     {
         while(match(substr($0,x+=(RSTART+RLENGTH-(x>1?1:0))),"[^[:space:]]+")){
             E[++D]=(RSTART+x-(x>1?1:0))
             F[D]=E[D]+RLENGTH
         }
     }

     b[$4]~/./{$0=substr($0,0,E[4]-1) b[$4] substr($0,F[4])}
     {x=1;D=0;delete E}1' FILEB FILEA

实施例

<强>输入

FILEA

xxx    Z0002   xxx Z0002 xxx    xxx   xxx Z0002
xxx    Z0002   xxx dsasa xxx    xxx   xxx Z0002

FILEB

3100,3000
W0002,Z0002

<强>输出

xxx    Z0002   xxx W0002 xxx    xxx   xxx Z0002
xxx    Z0002   xxx dsasa xxx    xxx   xxx Z0002

解释

稍后会添加