我有这个pdb文件,我想用原子12,14,15,17和18来计算原子7和8($ 2)之间的距离。如果距离低于5个,那么值应该是印刷
ATOM 1 N ASN p 140 38.455 18.232 -3.207 1.00 7.39 N
ATOM 2 CA ASN p 140 37.856 18.151 -4.534 1.00 7.91 C
ATOM 3 C ASN p 140 38.700 18.848 -5.595 1.00 10.75 C
ATOM 4 O ASN p 140 39.797 19.271 -5.313 1.00 9.25 O
ATOM 5 CB ASN p 140 36.435 18.715 -4.446 1.00 7.62 C
ATOM 6 CG ASN p 140 35.556 17.898 -3.501 1.00 6.82 C
ATOM 7 OD1 ASN p 140 35.269 18.315 -2.323 1.00 8.53 O
ATOM 8 ND2 ASN p 140 35.197 16.691 -3.945 1.00 5.41 N
TER 9 ASN 140
HETATM 10 C 08H p 1 29.121 15.727 -1.182 1.00 5.89 C
HETATM 11 C 08H p 1 29.763 16.230 -0.040 1.00 5.86 C
HETATM 12 N 08H p 1 31.023 16.810 -0.046 1.00 6.15 N
HETATM 13 C 08H p 1 31.533 17.872 0.633 1.00 6.24 C
HETATM 14 N 08H p 1 32.815 18.037 0.299 1.00 6.83 N
HETATM 15 N 08H p 1 33.151 17.112 -0.526 1.00 7.37 C
HETATM 16 C 08H p 1 32.058 16.349 -0.758 1.00 7.06 C
HETATM 17 O 08H p 1 31.956 15.215 -1.730 1.00 8.15 O
HETATM 18 N 08H p 1 30.979 15.691 -2.746 1.00 10.31 N
HETATM 19 C 08H p 1 29.651 15.777 -2.509 1.00 6.71 C
HETATM 20 O HOH p 170 34.699 19.032 2.134 1.00 6.42 O
基于类似的脚本,我编写了这段代码
# usage: awk -f test.awk structure.pdb
BEGIN{print "asparagine and ligand in the structure..."; ORS=""}
$1=="ATOM" && $3~"ND2|OD1" && $4=="ASN" || $1=="HETATM" && $12~"N|O" && $4!~"HOH" {
print $2,$3,$4,$6"\n"
atm_x[$2]=$7; atm_y[$2]=$8; atm_z[$2]=$9
}
END{ ORS="\n"
for (key1 in atm_x) { list=list" "key1
for (key2 in atm_x) {
if (index(list, key2) != 0 ) continue
dx=atm_x[key1]-atm_x[key2]
dy=atm_y[key1]-atm_y[key2]
dz=atm_z[key1]-atm_z[key2]
distance=sqrt(dx^2+dy^2+dz^2)
if (distance < 5 && distance != 0 ) {
i++
candidate[i]=key1"-"key2": "distance
}
}
}
print "\nCandidates ..."
for (keys in candidate) {print candidate[keys]}
}
当我运行此脚本时,我得到以下结果
asparagine and ligand in the structure...
7 OD1 ASN 140
8 ND2 ASN 140
12 N 08H 1
14 N 08H 1
17 O 08H 1
18 N 08H 1
Candidates ...
7-8: 2.2964
7-14: 3.60198
7-17: 4.57576
8-17: 4.19391
8-18: 4.49768
12-14: 2.19905
12-17: 2.50007
12-18: 2.92303
14-17: 3.58028
14-18: 4.25989
17-18: 1.48774
问题是当原子具有相同的残留名称($ 4)时,我不想打印距离。我是awk的新手,想知道处理这个问题的最佳方法是什么。任何建议将不胜感激!!
答案 0 :(得分:0)
awk '
($1=="ATOM" && ($3=="ND2" || $3=="OD1") && $4=="ASN") || \
($1=="HETATM" && ($12=="N" || $12 =="O") && $4!="HOH") {
atom[$2] = 1
x[$2] = $7
y[$2] = $8
z[$2] = $9
name[$2] = $4
}
END {
for (a in atom) {
for (b in atom) {
if (a > b && name[a] != name[b]) {
dist = sqrt((x[a]-x[b])^2 + (y[a]-y[b])^2 + (z[a]-z[b])^2)
if (dist < 5)
printf "%s-%s: %.4f\n", a, b, dist
}
}
}
}
' pdbfile
7-17: 4.5758
7-14: 3.6020
8-17: 4.1939
8-18: 4.4977