我需要计算处理pdb的txt文件的PDB文件中的残基数,其中包括标准残基(20个氨基酸)和非标准残基(定义为WAT,HOH,0XB,4XB,ROH) )以下是pdb的示例(从蛋白质的第216个残基开始):第4列描述了残基类型(我需要申报,请参见下文),第6列说明了其编号(我需要使用申报者进行计数)>
ATOM 1690 CB VAL A 216 22.030 -8.161 -21.267 1.00 20.78 C
ATOM 1691 CG1 VAL A 216 20.950 -8.538 -22.272 1.00 20.90 C
ATOM 1692 CG2 VAL A 216 21.487 -8.155 -19.837 1.00 21.07 C
ATOM 1693 N TYR A 217 23.855 -9.511 -23.682 1.00 20.89 N
ATOM 1694 CA TYR A 217 24.587 -9.239 -24.892 1.00 21.41 C
ATOM 1695 C TYR A 217 23.884 -9.776 -26.138 1.00 21.96 C
ATOM 1696 O TYR A 217 23.019 -10.645 -26.044 1.00 20.66 O
ATOM 1697 CB TYR A 217 25.980 -9.861 -24.783 1.00 21.36 C
ATOM 1698 CG TYR A 217 25.961 -11.371 -24.730 1.00 20.15 C
ATOM 1699 CD1 TYR A 217 25.923 -12.123 -25.895 1.00 20.52 C
ATOM 1700 CD2 TYR A 217 25.972 -12.036 -23.519 1.00 20.44 C
ATOM 1701 CE1 TYR A 217 25.909 -13.495 -25.858 1.00 21.27 C
ATOM 1702 CE2 TYR A 217 25.946 -13.405 -23.463 1.00 21.22 C
ATOM 1703 CZ TYR A 217 25.917 -14.134 -24.640 1.00 21.52 C
ATOM 1704 OH TYR A 217 25.901 -15.501 -24.594 1.00 21.23 O
ATOM 1705 N THR A 218 24.288 -9.239 -27.290 1.00 23.07 N
ATOM 1706 CA THR A 218 23.893 -9.749 -28.602 1.00 24.36 C
ATOM 1707 C THR A 218 25.115 -10.168 -29.224 1.00 24.98 C
ATOM 1708 O THR A 218 26.221 -9.703 -29.135 1.00 26.18 O
ATOM 1709 CB THR A 218 23.212 -8.685 -29.452 1.00 24.94 C
ATOM 1710 CG2 THR A 218 21.931 -8.252 -28.844 1.00 25.53 C
ATOM 1711 OG1 THR A 218 24.091 -7.552 -29.584 1.00 26.97 O
ATOM 1712 N THR A 219 24.908 -11.042 -30.371 1.00 25.55 N
ATOM 1713 CA THR A 219 25.958 -11.438 -31.312 1.00 25.87 C
ATOM 1714 C THR A 219 25.331 -11.567 -32.692 1.00 25.97 C
ATOM 1715 O THR A 219 24.231 -12.113 -32.806 1.00 26.00 O
ATOM 1716 CB THR A 219 26.590 -12.790 -30.937 1.00 26.02 C
ATOM 1717 CG2 THR A 219 27.921 -12.977 -31.656 1.00 26.44 C
ATOM 1718 OG1 THR A 219 26.811 -12.845 -29.529 1.00 27.48 O
TER
ATOM 1719 C1 0XB B 220 6.613 3.931 -16.928 1.00 11.35 C
ATOM 1720 C2 0XB B 220 7.042 5.128 -16.070 1.00 14.60 C
ATOM 1721 O2 0XB B 220 6.347 5.144 -14.862 1.00 15.67 O
ATOM 1722 C3 0XB B 220 6.767 6.445 -16.786 1.00 17.91 C
ATOM 1723 O3 0XB B 220 7.304 7.499 -15.962 1.00 20.75 O
ATOM 1724 C4 0XB B 220 7.275 6.470 -18.142 1.00 17.97 C
ATOM 1725 O4 0XB B 220 6.793 7.605 -18.882 1.00 21.45 O
ATOM 1726 C5 0XB B 220 6.856 5.264 -18.860 1.00 15.05 C
ATOM 1727 O5 0XB B 220 7.286 4.049 -18.182 1.00 12.43 O
TER
ATOM 1728 C1 4XB B 221 5.359 -0.924 -15.781 1.00 6.64 C
ATOM 1729 C2 4XB B 221 6.747 -0.906 -16.375 1.00 6.37 C
ATOM 1730 O2 4XB B 221 7.571 -1.862 -15.682 1.00 6.70 O
ATOM 1731 C3 4XB B 221 7.383 0.437 -16.239 1.00 6.53 C
ATOM 1732 O3 4XB B 221 8.614 0.454 -16.952 1.00 7.17 O
ATOM 1733 C4 4XB B 221 6.496 1.523 -16.706 1.00 6.86 C
ATOM 1734 O4 4XB B 221 7.055 2.756 -16.341 1.00 7.93 O
ATOM 1735 C5 4XB B 221 5.093 1.393 -16.084 1.00 7.13 C
ATOM 1736 O5 4XB B 221 4.551 0.106 -16.388 1.00 6.67 O
TER
ATOM 1737 C1 4XB B 222 3.415 -5.497 -14.442 1.00 7.23 C
ATOM 1738 C2 4XB B 222 2.312 -4.754 -15.139 1.00 6.50 C
ATOM 1739 O2 4XB B 222 1.653 -5.586 -16.126 1.00 7.47 O
ATOM 1740 C3 4XB B 222 2.766 -3.427 -15.802 1.00 6.13 C
ATOM 1741 O3 4XB B 222 1.591 -2.648 -15.876 1.00 7.02 O
ATOM 1742 C4 4XB B 222 3.901 -2.748 -15.121 1.00 6.42 C
ATOM 1743 O4 4XB B 222 4.705 -2.120 -16.113 1.00 6.84 O
ATOM 1744 C5 4XB B 222 4.809 -3.691 -14.357 1.00 8.21 C
ATOM 1745 O5 4XB B 222 4.113 -4.581 -13.598 1.00 8.20 O
TER
ATOM 1746 C1 4XB B 223 0.279 -9.093 -11.950 1.00 7.34 C
ATOM 1747 C2 4XB B 223 1.016 -9.549 -13.204 1.00 7.02 C
ATOM 1748 O2 4XB B 223 1.505 -10.863 -12.994 1.00 7.75 O
ATOM 1749 C3 4XB B 223 2.133 -8.658 -13.519 1.00 6.43 C
ATOM 1750 O3 4XB B 223 2.629 -8.987 -14.830 1.00 7.19 O
ATOM 1751 C4 4XB B 223 1.777 -7.230 -13.449 1.00 6.89 C
ATOM 1752 O4 4XB B 223 2.948 -6.411 -13.418 0.93 7.29 O
ATOM 1753 C5 4XB B 223 1.002 -6.902 -12.189 1.00 8.10 C
ATOM 1754 O5 4XB B 223 -0.123 -7.763 -12.080 1.00 7.86 O
TER
ATOM 1755 C1 4XB B 224 -2.316 -11.723 -8.228 1.00 13.30 C
ATOM 1756 C2 4XB B 224 -3.173 -10.657 -8.950 1.00 14.61 C
ATOM 1757 O2 4XB B 224 -4.557 -11.027 -8.952 1.00 16.65 O
ATOM 1758 C3 4XB B 224 -2.726 -10.237 -10.411 1.00 14.12 C
ATOM 1759 O3 4XB B 224 -3.502 -9.178 -10.953 1.00 16.48 O
ATOM 1760 C4 4XB B 224 -1.249 -9.979 -10.435 1.00 10.48 C
ATOM 1761 O4 4XB B 224 -0.865 -9.856 -11.797 1.00 8.86 O
ATOM 1762 C5 4XB B 224 -0.606 -11.154 -9.841 1.00 11.33 C
ATOM 1763 O5 4XB B 224 -0.960 -11.228 -8.458 1.00 11.66 O
TER
ATOM 1764 C1 4XB B 225 -3.351 -14.867 -4.461 1.00 20.40 C
ATOM 1765 C2 4XB B 225 -2.077 -14.106 -4.278 1.00 19.49 C
ATOM 1766 O2 4XB B 225 -1.838 -13.768 -2.832 1.00 21.49 O
ATOM 1767 C3 4XB B 225 -1.967 -12.786 -4.966 1.00 16.78 C
ATOM 1768 O3 4XB B 225 -0.720 -12.198 -4.915 1.00 16.51 O
ATOM 1769 C4 4XB B 225 -2.440 -12.958 -6.498 1.00 15.44 C
ATOM 1770 O4 4XB B 225 -2.665 -11.826 -7.052 1.00 14.20 O
ATOM 1771 C5 4XB B 225 -3.741 -13.791 -6.558 1.00 17.51 C
ATOM 1772 O5 4XB B 225 -3.628 -15.047 -5.837 1.00 19.83 O
TER
ATOM 1773 O1 ROH B 226 -3.357 -16.125 -3.876 1.00 22.81 O
TER
ATOM 1774 O HOH A 227 24.864 6.980 -5.291 1.00 2.00 O
ATOM 1775 O HOH A 228 17.275 7.500 -2.455 1.00 14.57 O
我需要计算pdb的残基数,忽略所有名称为HOH(在我的示例中为)和WAT的残基。
为此,我使用这种AWK溶液,该溶液对所有残基进行计数,包括OXB,4XB,ROH和HOH等非标准残基:
awk '{ a[$4 $6 FILENAME]++ }
END {
for (i in a) { b[substr(i,1,3)]++ }
for (i in b)
{
total+=b[i]
}
printf "\nTotal no:of residues - %d\n", total
}' file.pdb
如何修改此代码:i)排除非标准残留的计数,以及ii)将计数的残留数另存为新变量,而不是打印出来?
例如假设我从示例PDB中的第一个残基开始,在该特定示例中,计数的残基数应为219(而不是228,因此不包括2个HOH残基)。并且该值应在执行脚本后保存在新变量中
谢谢!