将计数的残留数保存在新变量中的PDB文件中

时间:2019-06-24 07:46:44

标签: variables awk protein-database

我需要计算处理pdb的txt文件的PDB文件中的残基数,其中包括标准残基(20个氨基酸)和非标准残基(定义为WAT,HOH,0XB,4XB,ROH) )以下是pdb的示例(从蛋白质的第216个残基开始):第4列描述了残基类型(我需要申报,请参见下文),第6列说明了其编号(我需要使用申报者进行计数)

ATOM   1690  CB  VAL A 216      22.030  -8.161 -21.267  1.00 20.78           C  
ATOM   1691  CG1 VAL A 216      20.950  -8.538 -22.272  1.00 20.90           C  
ATOM   1692  CG2 VAL A 216      21.487  -8.155 -19.837  1.00 21.07           C  
ATOM   1693  N   TYR A 217      23.855  -9.511 -23.682  1.00 20.89           N  
ATOM   1694  CA  TYR A 217      24.587  -9.239 -24.892  1.00 21.41           C  
ATOM   1695  C   TYR A 217      23.884  -9.776 -26.138  1.00 21.96           C  
ATOM   1696  O   TYR A 217      23.019 -10.645 -26.044  1.00 20.66           O  
ATOM   1697  CB  TYR A 217      25.980  -9.861 -24.783  1.00 21.36           C  
ATOM   1698  CG  TYR A 217      25.961 -11.371 -24.730  1.00 20.15           C  
ATOM   1699  CD1 TYR A 217      25.923 -12.123 -25.895  1.00 20.52           C  
ATOM   1700  CD2 TYR A 217      25.972 -12.036 -23.519  1.00 20.44           C  
ATOM   1701  CE1 TYR A 217      25.909 -13.495 -25.858  1.00 21.27           C  
ATOM   1702  CE2 TYR A 217      25.946 -13.405 -23.463  1.00 21.22           C  
ATOM   1703  CZ  TYR A 217      25.917 -14.134 -24.640  1.00 21.52           C  
ATOM   1704  OH  TYR A 217      25.901 -15.501 -24.594  1.00 21.23           O  
ATOM   1705  N   THR A 218      24.288  -9.239 -27.290  1.00 23.07           N  
ATOM   1706  CA  THR A 218      23.893  -9.749 -28.602  1.00 24.36           C  
ATOM   1707  C   THR A 218      25.115 -10.168 -29.224  1.00 24.98           C  
ATOM   1708  O   THR A 218      26.221  -9.703 -29.135  1.00 26.18           O  
ATOM   1709  CB  THR A 218      23.212  -8.685 -29.452  1.00 24.94           C  
ATOM   1710  CG2 THR A 218      21.931  -8.252 -28.844  1.00 25.53           C  
ATOM   1711  OG1 THR A 218      24.091  -7.552 -29.584  1.00 26.97           O  
ATOM   1712  N   THR A 219      24.908 -11.042 -30.371  1.00 25.55           N  
ATOM   1713  CA  THR A 219      25.958 -11.438 -31.312  1.00 25.87           C  
ATOM   1714  C   THR A 219      25.331 -11.567 -32.692  1.00 25.97           C  
ATOM   1715  O   THR A 219      24.231 -12.113 -32.806  1.00 26.00           O  
ATOM   1716  CB  THR A 219      26.590 -12.790 -30.937  1.00 26.02           C  
ATOM   1717  CG2 THR A 219      27.921 -12.977 -31.656  1.00 26.44           C  
ATOM   1718  OG1 THR A 219      26.811 -12.845 -29.529  1.00 27.48           O 
TER 
ATOM   1719  C1  0XB B 220       6.613   3.931 -16.928  1.00 11.35           C  
ATOM   1720  C2  0XB B 220       7.042   5.128 -16.070  1.00 14.60           C  
ATOM   1721  O2  0XB B 220       6.347   5.144 -14.862  1.00 15.67           O  
ATOM   1722  C3  0XB B 220       6.767   6.445 -16.786  1.00 17.91           C  
ATOM   1723  O3  0XB B 220       7.304   7.499 -15.962  1.00 20.75           O  
ATOM   1724  C4  0XB B 220       7.275   6.470 -18.142  1.00 17.97           C  
ATOM   1725  O4  0XB B 220       6.793   7.605 -18.882  1.00 21.45           O  
ATOM   1726  C5  0XB B 220       6.856   5.264 -18.860  1.00 15.05           C  
ATOM   1727  O5  0XB B 220       7.286   4.049 -18.182  1.00 12.43           O  
TER
ATOM   1728  C1  4XB B 221       5.359  -0.924 -15.781  1.00  6.64           C  
ATOM   1729  C2  4XB B 221       6.747  -0.906 -16.375  1.00  6.37           C  
ATOM   1730  O2  4XB B 221       7.571  -1.862 -15.682  1.00  6.70           O  
ATOM   1731  C3  4XB B 221       7.383   0.437 -16.239  1.00  6.53           C  
ATOM   1732  O3  4XB B 221       8.614   0.454 -16.952  1.00  7.17           O  
ATOM   1733  C4  4XB B 221       6.496   1.523 -16.706  1.00  6.86           C  
ATOM   1734  O4  4XB B 221       7.055   2.756 -16.341  1.00  7.93           O  
ATOM   1735  C5  4XB B 221       5.093   1.393 -16.084  1.00  7.13           C  
ATOM   1736  O5  4XB B 221       4.551   0.106 -16.388  1.00  6.67           O 
TER 
ATOM   1737  C1  4XB B 222       3.415  -5.497 -14.442  1.00  7.23           C  
ATOM   1738  C2  4XB B 222       2.312  -4.754 -15.139  1.00  6.50           C  
ATOM   1739  O2  4XB B 222       1.653  -5.586 -16.126  1.00  7.47           O  
ATOM   1740  C3  4XB B 222       2.766  -3.427 -15.802  1.00  6.13           C  
ATOM   1741  O3  4XB B 222       1.591  -2.648 -15.876  1.00  7.02           O  
ATOM   1742  C4  4XB B 222       3.901  -2.748 -15.121  1.00  6.42           C  
ATOM   1743  O4  4XB B 222       4.705  -2.120 -16.113  1.00  6.84           O  
ATOM   1744  C5  4XB B 222       4.809  -3.691 -14.357  1.00  8.21           C  
ATOM   1745  O5  4XB B 222       4.113  -4.581 -13.598  1.00  8.20           O
TER  
ATOM   1746  C1  4XB B 223       0.279  -9.093 -11.950  1.00  7.34           C  
ATOM   1747  C2  4XB B 223       1.016  -9.549 -13.204  1.00  7.02           C  
ATOM   1748  O2  4XB B 223       1.505 -10.863 -12.994  1.00  7.75           O  
ATOM   1749  C3  4XB B 223       2.133  -8.658 -13.519  1.00  6.43           C  
ATOM   1750  O3  4XB B 223       2.629  -8.987 -14.830  1.00  7.19           O  
ATOM   1751  C4  4XB B 223       1.777  -7.230 -13.449  1.00  6.89           C  
ATOM   1752  O4  4XB B 223       2.948  -6.411 -13.418  0.93  7.29           O  
ATOM   1753  C5  4XB B 223       1.002  -6.902 -12.189  1.00  8.10           C  
ATOM   1754  O5  4XB B 223      -0.123  -7.763 -12.080  1.00  7.86           O  
TER
ATOM   1755  C1  4XB B 224      -2.316 -11.723  -8.228  1.00 13.30           C  
ATOM   1756  C2  4XB B 224      -3.173 -10.657  -8.950  1.00 14.61           C  
ATOM   1757  O2  4XB B 224      -4.557 -11.027  -8.952  1.00 16.65           O  
ATOM   1758  C3  4XB B 224      -2.726 -10.237 -10.411  1.00 14.12           C  
ATOM   1759  O3  4XB B 224      -3.502  -9.178 -10.953  1.00 16.48           O  
ATOM   1760  C4  4XB B 224      -1.249  -9.979 -10.435  1.00 10.48           C  
ATOM   1761  O4  4XB B 224      -0.865  -9.856 -11.797  1.00  8.86           O  
ATOM   1762  C5  4XB B 224      -0.606 -11.154  -9.841  1.00 11.33           C  
ATOM   1763  O5  4XB B 224      -0.960 -11.228  -8.458  1.00 11.66           O  
TER
ATOM   1764  C1  4XB B 225      -3.351 -14.867  -4.461  1.00 20.40           C  
ATOM   1765  C2  4XB B 225      -2.077 -14.106  -4.278  1.00 19.49           C  
ATOM   1766  O2  4XB B 225      -1.838 -13.768  -2.832  1.00 21.49           O  
ATOM   1767  C3  4XB B 225      -1.967 -12.786  -4.966  1.00 16.78           C  
ATOM   1768  O3  4XB B 225      -0.720 -12.198  -4.915  1.00 16.51           O  
ATOM   1769  C4  4XB B 225      -2.440 -12.958  -6.498  1.00 15.44           C  
ATOM   1770  O4  4XB B 225      -2.665 -11.826  -7.052  1.00 14.20           O  
ATOM   1771  C5  4XB B 225      -3.741 -13.791  -6.558  1.00 17.51           C  
ATOM   1772  O5  4XB B 225      -3.628 -15.047  -5.837  1.00 19.83           O  
TER
ATOM   1773  O1  ROH B 226      -3.357 -16.125  -3.876  1.00 22.81           O  
TER
ATOM   1774  O   HOH A 227      24.864   6.980  -5.291  1.00  2.00           O  
ATOM   1775  O   HOH A 228      17.275   7.500  -2.455  1.00 14.57           O  

我需要计算pdb的残基数,忽略所有名称为HOH(在我的示例中为)和WAT的残基。

为此,我使用这种AWK溶液,该溶液对所有残基进行计数,包括OXB,4XB,ROH和HOH等非标准残基:

 awk '{ a[$4 $6 FILENAME]++ }
   END {
     for (i in a) { b[substr(i,1,3)]++ }
     for (i in b)
     {
       total+=b[i]
     }
     printf "\nTotal no:of residues - %d\n", total
   }' file.pdb

如何修改此代码:i)排除非标准残留的计数,以及ii)将计数的残留数另存为新变量,而不是打印出来?

例如假设我从示例PDB中的第一个残基开始,在该特定示例中,计数的残基数应为219(而不是228,因此不包括2个HOH残基)。并且该值应在执行脚本后保存在新变量中

谢谢!

0 个答案:

没有答案