想要一个awk程序来计算结构文件-MOL2文件中的重原子数

时间:2013-10-10 06:08:36

标签: awk

我有这种格式的'bestranking.lst':

 37.55       6.00      24.98       0.00      -2.80      -3.90   26.675  './gold_soln_CB_FragLib_Controls_m1_9.mol2'                    'ethyl'
 38.45       1.39      27.36       0.00      -0.56      -2.48   22.724  './gold_soln_CB_FragLib_Controls_m2_6.mol2'  'pyridin-2-yl(pyridin-3-yl)methanone'
 38.47       0.00      28.44       0.00      -0.64      -2.42   20.387  './gold_soln_CB_FragLib_Controls_m3_3.mol2'  'pyridin-2-yl(pyridin-4-yl)methanone'
 42.49       0.07      30.87       0.00      -0.03      -3.24   22.903  './gold_soln_CB_FragLib_Controls_m4_5.mol2'  '(3-chlorophenyl)(pyridin-3-yl)methanone'
 38.20       1.47      27.53       0.00      -1.13      -3.28   22.858  './gold_soln_CB_FragLib_Controls_m5_2.mol2'  'dipyridin-4-ylmethanone'

第9列代表分子的名称。第8列代表这些分子的相应的Mol2结构文件。

我需要一个awk程序来计算每个分子中HEAVY原子的总数。重原子位于每个Mol2文件的第2列。打开的Mol2文件看起来像这样:

  1 C1          75.9844  97.5040  19.3570 C.ar           1 SUB               -0.0695
  2 C2          74.9992  96.8780  20.1442 C.ar           1 SUB               -0.1625
  3 C3          75.3743  95.9247  21.1091 C.ar           1 SUB               -0.0561
  4 C4          76.7311  95.5991  21.2853 C.ar           1 SUB               -0.1359
  5 C5          77.7134  96.2252  20.4983 C.ar           1 SUB               -0.0708
  6 C6          77.3397  97.1775  19.5344 C.ar           1 SUB               -0.1411
  7 C7          73.5585  97.2251  19.9557 C.2            1 SUB                0.7353
  8 N8          72.7698  97.3734  21.0597 N.2            1 SUB               -0.6704
  9 C9          71.6047  97.8943  20.9482 C.2            1 SUB                0.5895
 10 N10         70.7604  98.0475  22.1854 N.4            1 SUB               -0.6099
 11 C11         69.8867  96.8655  22.4153 C.ar           1 SUB               -0.0016
 12 C12         70.0298  96.1021  23.5863 C.ar           1 SUB               -0.1438
 13 C13         69.2027  94.9861  23.8019 C.ar           1 SUB               -0.0494
 14 C14         68.2349  94.6340  22.8465 C.ar           1 SUB               -0.1913
 15 C15         68.0885  95.3951  21.6742 C.ar           1 SUB                0.2110
 16 C16         68.9160  96.5114  21.4595 C.ar           1 SUB               -0.1465
 17 S17         70.9482  98.4291  19.4875 S.3            1 SUB               -0.2097
 18 O18         73.0950  97.3706  18.8479 O.2            1 SUB               -0.5679
 19 O19         67.1788  95.0628  20.7807 O.3            1 SUB               -0.4957
 20 H20         75.7049  98.2370  18.6140 H              1 SUB                0.1406
 21 H21         74.6259  95.4380  21.7188 H              1 SUB                0.1556
 22 H22         77.0170  94.8679  22.0255 H              1 SUB                0.1541
 23 H23         78.7539  95.9739  20.6351 H              1 SUB                0.1510
 24 H24         78.0936  97.6579  18.9305 H              1 SUB                0.1485
 25 H25         70.7725  96.3698  24.3234 H              1 SUB                0.1557
 26 H26         69.3139  94.4004  24.7027 H              1 SUB                0.1708
 27 H27         67.6005  93.7759  23.0159 H              1 SUB                0.1642
 28 H28         68.8033  97.0984  20.5601 H              1 SUB                0.1648
 29 H29         71.4000  98.2003  23.0547 H              1 SUB                0.4930
 30 H30         70.1464  98.9429  22.1082 H              1 SUB                0.4930
 31 H31         66.9217  95.7038  20.1074 H              1 SUB                0.4777
 32 H32         69.7912  99.0852  19.3144 H              1 SUB                0.3173
 33 ****        73.1012  97.0758  21.9550 LP             1 SUB                0.0000
 34 ****        73.6781  97.2587  18.0433 LP             1 SUB                0.0000
 35 ****        72.1288  97.6029  18.7367 LP             1 SUB                0.0000
 36 ****        66.3497  94.8209  21.2848 LP             1 SUB                0.0000
 37 ****        67.5235  94.2568  20.2995 LP             1 SUB                0.0000

我希望程序打开每个mol2文件并在第2列中仅计算重原子(即C,N,O,S等;不包括氢和 * *)。

我希望输出的格式与'bestranking.lst'文件的格式相同,但是另一列显示每个分子的总重原子数。我只能输出每个分子的“分子名称”和“重原子数”。

先谢谢。

1 个答案:

答案 0 :(得分:1)

尝试以下bash脚本:

files=$(awk -f extractFiles.awk bestranking.lst)
cnt=""
for file in $files ; do
    cnt=$cnt$(awk -f comp.awk $file)":"
done
awk -v cnt=$cnt -f addCol.awk bestranking.lst

其中extractFiles.awk是:

{
    gsub(/'/,"",$8)
    print $8
}

comp.awk

! (($2 ~ /^H/) || ($2 ~ /^\*\*\*/)) {i++}
END {print i}

addCol.awk

BEGIN {
    split(cnt,a,":")
}
{ print a[NR], $0 }

更新

根据评论:要计算文件给定部分中的重原子,请尝试将comp.awk更改为:

/\@<TRIPOS>ATOM/ { count=1;  next}
/\@<TRIPOS>BOND/ { count=0}
count && ! (($2 ~ /^H/) || ($2 ~ /^\*\*\*/)) {i++}
END {print i}