如果前两列匹配,则用另一个文件中的数据替换多列

时间:2020-10-15 20:53:54

标签: awk pdb-files

我需要将ori_file2.pdb中的第1列和第2列与new_file1.pdb中的第1列和第2列进行匹配。如果它们匹配,则用ori_file2.pdb中相同列中的数据替换new_file1.pdb中的3、4、5和6列,而无需更改new_file1.pdb中各列之间的空格。

ori_file2.pdb

HELIX    1   1 PHE A    2  ALA A    7  1                                   6
ATOM      1  N   PHE A   1      -3.631  -3.776  -2.910  1.00  0.00           N
ATOM      2  CA  PHE A   1      -2.182  -3.776  -2.910  1.00  0.00           C
ATOM      3  C   PHE A   1      -1.659  -2.347  -2.910  1.00  0.00           C
ATOM      4  O   PHE A   1      -0.766  -2.011  -2.135  1.00  0.00           O
ATOM      5  CB  PHE A   1      -1.630  -4.477  -4.142  1.00  0.00           C
ATOM      6  CG  PHE A   1      -1.888  -5.964  -4.196  1.00  0.00           C
ATOM      7  CD2 PHE A   1      -1.053  -6.844  -3.498  1.00  0.00           C
ATOM      8  CD1 PHE A   1      -2.962  -6.461  -4.943  1.00  0.00           C
ATOM      9  CE1 PHE A   1      -3.201  -7.840  -4.993  1.00  0.00           C
ATOM     10  CZ  PHE A   1      -2.366  -8.721  -4.295  1.00  0.00           C
ATOM     11  CE2 PHE A   1      -1.292  -8.223  -3.548  1.00  0.00           C
ATOM     12  N   PHE A   2      -2.218  -1.506  -3.783  1.00  0.00           N
ATOM     13  CA  PHE A   2      -1.808  -0.119  -3.881  1.00  0.00           C
ATOM     14  C   PHE A   2      -1.962   0.568  -2.532  1.00  0.00           C

new_file1.pdb

MODEL 1
COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.90
ATOM      1  N   LIG L   2     -28.497 -21.375   1.835  1.00  0.00           N  
ATOM      2  C   LIG L   2     -27.282 -21.191   1.068  1.00  0.00           C  
ATOM      3  C   LIG L   2     -27.048 -22.391   0.162  1.00  0.00           C  
ATOM      4  O   LIG L   2     -26.148 -23.191   0.408  1.00  0.00           O  
ATOM      5  C   LIG L   2     -26.071 -21.047   1.977  1.00  0.00           C  
ATOM      6  C   LIG L   2     -26.119 -19.866   2.917  1.00  0.00           C  
ATOM      7  C   LIG L   2     -26.393 -20.064   4.275  1.00  0.00           C  
ATOM      8  C   LIG L   2     -25.887 -18.575   2.430  1.00  0.00           C  
ATOM      9  C   LIG L   2     -25.932 -17.479   3.301  1.00  0.00           C  
ATOM     10  C   LIG L   2     -26.206 -17.677   4.660  1.00  0.00           C  
ATOM     11  C   LIG L   2     -26.438 -18.969   5.147  1.00  0.00           C  
ATOM     12  N   LIG L   2     -27.862 -22.514  -0.889  1.00  0.00           N  
ATOM     13  C   LIG L   2     -27.742 -23.613  -1.826  1.00  0.00           C  
ATOM     14  C   LIG L   2     -26.824 -23.222  -2.975  1.00  0.00           C  

我运行了以下代码

enter code here awk ' FNR==NR {                                        
split($0,a,/[[:space:]]*/)      
b[a[2]]=a[1]                    
next                            
}
{                               
n=split($0,d,/[^[:space:]]*/)   
if(b[$2])                       
    $3=b[$3]                    
for(i=1;i<=n;i++)               
    printf("%s%s",d[i],$i)      
print ""                        
}' ori_file2.pdb new_file1.pdb       

得到了这个结果

ATOM      1  ATOM   LIG L   2     -28.497 -21.375   1.835  1.00  0.00           N  
ATOM      2  ATOM   LIG L   2     -27.282 -21.191   1.068  1.00  0.00           C  
ATOM      3  ATOM   LIG L   2     -27.048 -22.391   0.162  1.00  0.00           C  
ATOM      4  ATOM   LIG L   2     -26.148 -23.191   0.408  1.00  0.00           O  
ATOM      5  ATOM   LIG L   2     -26.071 -21.047   1.977  1.00  0.00           C  
ATOM      6  ATOM   LIG L   2     -26.119 -19.866   2.917  1.00  0.00           C  
ATOM      7  ATOM   LIG L   2     -26.393 -20.064   4.275  1.00  0.00           C  
ATOM      8  ATOM   LIG L   2     -25.887 -18.575   2.430  1.00  0.00           C  
ATOM      9  ATOM   LIG L   2     -25.932 -17.479   3.301  1.00  0.00           C  
ATOM     10  ATOM   LIG L   2     -26.206 -17.677   4.660  1.00  0.00           C  
ATOM     11  ATOM   LIG L   2     -26.438 -18.969   5.147  1.00  0.00           C  
ATOM     12  ATOM   LIG L   2     -27.862 -22.514  -0.889  1.00  0.00           N  
ATOM     13  ATOM   LIG L   2     -27.742 -23.613  -1.826  1.00  0.00           C  
ATOM     14  ATOM   LIG L   2     -26.824 -23.222  -2.975  1.00  0.00           C  

但是,这是理想的结果

MODEL 1
COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.90
ATOM      1  N   PHE A   1     -28.497 -21.375   1.835  1.00  0.00           N  
ATOM      2  CA  PHE A   1     -27.282 -21.191   1.068  1.00  0.00           C  
ATOM      3  C   PHE A   1     -27.048 -22.391   0.162  1.00  0.00           C  
ATOM      4  O   PHE A   1     -26.148 -23.191   0.408  1.00  0.00           O  
ATOM      5  CB  PHE A   1     -26.071 -21.047   1.977  1.00  0.00           C  
ATOM      6  CG  PHE A   1     -26.119 -19.866   2.917  1.00  0.00           C  
ATOM      7  CD2 PHE A   1     -26.393 -20.064   4.275  1.00  0.00           C  
ATOM      8  CD1 PHE A   1     -25.887 -18.575   2.430  1.00  0.00           C  
ATOM      9  CE1 PHE A   1     -25.932 -17.479   3.301  1.00  0.00           C  
ATOM     10  CZ  PHE A   1     -26.206 -17.677   4.660  1.00  0.00           C  
ATOM     11  CE2 PHE A   1     -26.438 -18.969   5.147  1.00  0.00           C  
ATOM     12  N   PHE A   2     -27.862 -22.514  -0.889  1.00  0.00           N  
ATOM     13  CA  PHE A   2     -27.742 -23.613  -1.826  1.00  0.00           C  
ATOM     14  C   PHE A   2     -26.824 -23.222  -2.975  1.00  0.00           C

我想保留file2的文件结构以用于下游分析。

1 个答案:

答案 0 :(得分:0)

如果您不担心保留间距,那么:

单线:

awk 'FNR==NR{a[$1,$2]=$3 FS $4 FS $5 FS $6;next}(($1,$2) in a){split(a[$1,$2],t);$3=t[1];$4=t[2];$5=t[3];$6=t[4]}1' ori_file2.pub ori_file1.pub 

使用.. | column -t

[akshay@db1 tmp]$ awk 'FNR==NR{a[$1,$2]=$3 FS $4 FS $5 FS $6;next}(($1,$2) in a){split(a[$1,$2],t);$3=t[1];$4=t[2];$5=t[3];$6=t[4]}1' ori_file2.pub ori_file1.pub  | column -t
MODEL   1                                                                          
COMPND  UNNAMED                                                                    
AUTHOR  GENERATED  BY   OPEN  BABEL  2.3.90                                        
ATOM    1          N    PHE   A      1       -28.497  -21.375  1.835   1.00  0.00  N
ATOM    2          CA   PHE   A      1       -27.282  -21.191  1.068   1.00  0.00  C
ATOM    3          C    PHE   A      1       -27.048  -22.391  0.162   1.00  0.00  C
ATOM    4          O    PHE   A      1       -26.148  -23.191  0.408   1.00  0.00  O
ATOM    5          CB   PHE   A      1       -26.071  -21.047  1.977   1.00  0.00  C
ATOM    6          CG   PHE   A      1       -26.119  -19.866  2.917   1.00  0.00  C
ATOM    7          CD2  PHE   A      1       -26.393  -20.064  4.275   1.00  0.00  C
ATOM    8          CD1  PHE   A      1       -25.887  -18.575  2.430   1.00  0.00  C
ATOM    9          CE1  PHE   A      1       -25.932  -17.479  3.301   1.00  0.00  C
ATOM    10         CZ   PHE   A      1       -26.206  -17.677  4.660   1.00  0.00  C
ATOM    11         CE2  PHE   A      1       -26.438  -18.969  5.147   1.00  0.00  C
ATOM    12         N    PHE   A      2       -27.862  -22.514  -0.889  1.00  0.00  N
ATOM    13         CA   PHE   A      2       -27.742  -23.613  -1.826  1.00  0.00  C
ATOM    14         C    PHE   A      2       -26.824  -23.222  -2.975  1.00  0.00  C

更易读:

awk 'FNR==NR{
         a[$1,$2]=$3 FS $4 FS $5 FS $6;
         next
     }
     (($1,$2) in a){
         split(a[$1,$2],t);
         $3=t[1]; $4=t[2]; $5=t[3]; $6=t[4]
     }1
    ' ori_file2.pub ori_file1.pub 

要保留空格:

awk 'FNR==NR{
       a[$1,$2]=$3 FS $4 FS $5 FS $6;
       next 
     }
     (($1,$2) in a){
       n=split($0,arr,FS,d);
       split(a[$1,$2],t);
       $3=t[1];$4=t[2];$5=t[3];$6=t[4]; 
       for(i=1;i<=n;i++)
          printf "%s%s", $(i),(i<n? d[i] : ORS);
       next
      }1
      ' ori_file2.pub ori_file1.pub 

或者甚至

GNU awk(已在GNU Awk 4.2.1上进行了测试):

awk 'FNR==NR{
       a[$1,$2]=$3 FS $4 FS $5 FS $6;
       next 
     }
     (($1,$2) in a){
       n=patsplit($0, arr, FPAT, d);
       split(a[$1,$2],t);
       $3=t[1]; $4=t[2]; $5=t[3]; $6=t[4]; 
       for(i=1;i<=n;i++)
          printf "%s%s", $(i),(i<n? d[i] : ORS);
       next
      }1
      ' ori_file2.pub ori_file1.pub 

测试结果:

[akshay@db1 tmp]$ cat ori_file1.pub 
MODEL 1
COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.90
ATOM      1  N   LIG L   2     -28.497 -21.375   1.835  1.00  0.00           N  
ATOM      2  C   LIG L   2     -27.282 -21.191   1.068  1.00  0.00           C  
ATOM      3  C   LIG L   2     -27.048 -22.391   0.162  1.00  0.00           C  
ATOM      4  O   LIG L   2     -26.148 -23.191   0.408  1.00  0.00           O  
ATOM      5  C   LIG L   2     -26.071 -21.047   1.977  1.00  0.00           C  
ATOM      6  C   LIG L   2     -26.119 -19.866   2.917  1.00  0.00           C  
ATOM      7  C   LIG L   2     -26.393 -20.064   4.275  1.00  0.00           C  
ATOM      8  C   LIG L   2     -25.887 -18.575   2.430  1.00  0.00           C  
ATOM      9  C   LIG L   2     -25.932 -17.479   3.301  1.00  0.00           C  
ATOM     10  C   LIG L   2     -26.206 -17.677   4.660  1.00  0.00           C  
ATOM     11  C   LIG L   2     -26.438 -18.969   5.147  1.00  0.00           C  
ATOM     12  N   LIG L   2     -27.862 -22.514  -0.889  1.00  0.00           N  
ATOM     13  C   LIG L   2     -27.742 -23.613  -1.826  1.00  0.00           C  
ATOM     14  C   LIG L   2     -26.824 -23.222  -2.975  1.00  0.00           C  

[akshay@db1 tmp]$ cat ori_file2.pub 
HELIX    1   1 PHE A    2  ALA A    7  1                                   6
ATOM      1  N   PHE A   1      -3.631  -3.776  -2.910  1.00  0.00           N
ATOM      2  CA  PHE A   1      -2.182  -3.776  -2.910  1.00  0.00           C
ATOM      3  C   PHE A   1      -1.659  -2.347  -2.910  1.00  0.00           C
ATOM      4  O   PHE A   1      -0.766  -2.011  -2.135  1.00  0.00           O
ATOM      5  CB  PHE A   1      -1.630  -4.477  -4.142  1.00  0.00           C
ATOM      6  CG  PHE A   1      -1.888  -5.964  -4.196  1.00  0.00           C
ATOM      7  CD2 PHE A   1      -1.053  -6.844  -3.498  1.00  0.00           C
ATOM      8  CD1 PHE A   1      -2.962  -6.461  -4.943  1.00  0.00           C
ATOM      9  CE1 PHE A   1      -3.201  -7.840  -4.993  1.00  0.00           C
ATOM     10  CZ  PHE A   1      -2.366  -8.721  -4.295  1.00  0.00           C
ATOM     11  CE2 PHE A   1      -1.292  -8.223  -3.548  1.00  0.00           C
ATOM     12  N   PHE A   2      -2.218  -1.506  -3.783  1.00  0.00           N
ATOM     13  CA  PHE A   2      -1.808  -0.119  -3.881  1.00  0.00           C
ATOM     14  C   PHE A   2      -1.962   0.568  -2.532  1.00  0.00           C

[akshay@db1 tmp]$ awk 'FNR==NR{
       a[$1,$2]=$3 FS $4 FS $5 FS $6;
       next 
     }
     (($1,$2) in a){
       n=split($0,arr,FS,d);
       split(a[$1,$2],t);
       $3=t[1];$4=t[2];$5=t[3];$6=t[4]; 
       for(i=1;i<=n;i++)
          printf "%s%s", $(i),(i<n? d[i] : ORS);
       next
      }1
      ' ori_file2.pub ori_file1.pub 
MODEL 1
COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.90
ATOM      1  N   PHE A   1     -28.497 -21.375   1.835  1.00  0.00           N
ATOM      2  CA   PHE A   1     -27.282 -21.191   1.068  1.00  0.00           C
ATOM      3  C   PHE A   1     -27.048 -22.391   0.162  1.00  0.00           C
ATOM      4  O   PHE A   1     -26.148 -23.191   0.408  1.00  0.00           O
ATOM      5  CB   PHE A   1     -26.071 -21.047   1.977  1.00  0.00           C
ATOM      6  CG   PHE A   1     -26.119 -19.866   2.917  1.00  0.00           C
ATOM      7  CD2   PHE A   1     -26.393 -20.064   4.275  1.00  0.00           C
ATOM      8  CD1   PHE A   1     -25.887 -18.575   2.430  1.00  0.00           C
ATOM      9  CE1   PHE A   1     -25.932 -17.479   3.301  1.00  0.00           C
ATOM     10  CZ   PHE A   1     -26.206 -17.677   4.660  1.00  0.00           C
ATOM     11  CE2   PHE A   1     -26.438 -18.969   5.147  1.00  0.00           C
ATOM     12  N   PHE A   2     -27.862 -22.514  -0.889  1.00  0.00           N
ATOM     13  CA   PHE A   2     -27.742 -23.613  -1.826  1.00  0.00           C
ATOM     14  C   PHE A   2     -26.824 -23.222  -2.975  1.00  0.00           C

[akshay@db1 tmp]$ awk 'FNR==NR{
       a[$1,$2]=$3 FS $4 FS $5 FS $6;
       next 
     }
     (($1,$2) in a){
       n=patsplit($0, arr, FPAT, d);
       split(a[$1,$2],t);
       $3=t[1]; $4=t[2]; $5=t[3]; $6=t[4]; 
       for(i=1;i<=n;i++)
          printf "%s%s", $(i),(i<n? d[i] : ORS);
       next
      }1
      ' ori_file2.pub ori_file1.pub 
MODEL 1
COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.90
ATOM      1  N   PHE A   1     -28.497 -21.375   1.835  1.00  0.00           N
ATOM      2  CA   PHE A   1     -27.282 -21.191   1.068  1.00  0.00           C
ATOM      3  C   PHE A   1     -27.048 -22.391   0.162  1.00  0.00           C
ATOM      4  O   PHE A   1     -26.148 -23.191   0.408  1.00  0.00           O
ATOM      5  CB   PHE A   1     -26.071 -21.047   1.977  1.00  0.00           C
ATOM      6  CG   PHE A   1     -26.119 -19.866   2.917  1.00  0.00           C
ATOM      7  CD2   PHE A   1     -26.393 -20.064   4.275  1.00  0.00           C
ATOM      8  CD1   PHE A   1     -25.887 -18.575   2.430  1.00  0.00           C
ATOM      9  CE1   PHE A   1     -25.932 -17.479   3.301  1.00  0.00           C
ATOM     10  CZ   PHE A   1     -26.206 -17.677   4.660  1.00  0.00           C
ATOM     11  CE2   PHE A   1     -26.438 -18.969   5.147  1.00  0.00           C
ATOM     12  N   PHE A   2     -27.862 -22.514  -0.889  1.00  0.00           N
ATOM     13  CA   PHE A   2     -27.742 -23.613  -1.826  1.00  0.00           C
ATOM     14  C   PHE A   2     -26.824 -23.222  -2.975  1.00  0.00           C