我有300个pdb fies有2个链。我想计算第一链的第一个原子与第二个链的所有原子之间的距离。然后,第一个链的第二个原子到第二个链的所有原子。这必须重复300个文件。我只需要在距离> = 5时打印原子对,并将输出保存到具有输入文件名的另一个文件夹。寻找距离的公式是sqrt((x1-x2)^ 2 +(y1-y2)^ 2 +(z1-z2)^ 2)。 $ 5是链ID,$ 12是原子名。 $ 7,$ 8,$ 9是x,y,z坐标。您的宝贵建议将不胜感激!!
ATOM 1 N MET A 1 -16.220 53.312 36.564 1.00 32.19 N
ATOM 2 CA MET A 1 -15.722 52.290 37.522 1.00 28.47 C
ATOM 3 C MET A 1 -14.451 51.635 37.011 1.00 26.82 C
ATOM 2542 CG ASN B 17 -1.077 9.776 13.155 1.00 18.23 C
ATOM 2543 OD1 ASN B 17 -0.563 9.098 12.250 1.00 18.58 O
ATOM 2544 ND2 ASN B 17 -0.632 9.746 14.418 1.00 14.82 N
所需的输出(距离值不正确)
N-C 8.90
N-O 10.3
N-N 7.62
C-C 12.45
C-O 9.0
C-N 9.89
C-C 11.45
C-O 19.0
C-N 10.89
答案 0 :(得分:1)
这是使用GNU awk
的一种方式。像:
awk -f script.awk file{,}
script.awk
NR==1 {
n = $5
}
FNR==NR && $5 != n {
a[c++]=$0
}
FNR!=NR && $5 == n {
for (i=0;i<=c-1;i++) {
split (a[i],b)
dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2)
if (dist >= 5) {
printf "%s-%s\t%.2f\n", $NF, b[NF], dist
}
}
}
制表符分隔结果:
N-C 51.70
N-O 52.83
N-N 51.30
C-C 51.14
C-O 52.29
C-N 50.71
C-C 50.00
C-O 51.14
C-N 49.56
或者,这是单行:
awk 'NR==1 { n = $5 } FNR==NR && $5 != n { a[c++]=$0 } FNR!=NR && $5 == n { for (i=0;i<=c-1;i++) { split (a[i],b); dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2); if (dist >= 5) printf "%s-%s\t%.2f\n", $NF, b[NF], dist } }' file{,}
因此,要在当前工作目录中的多个文件上执行此操作,并假设此目录中只有感兴趣的文件,您可以围绕for
语句包装awk
循环。显然,您需要将/path/to/folder/
更改为您选择的路径才能正常工作:
for i in *; do awk 'NR==1 { n = $5 } FNR==NR && $5 != n { a[c++]=$0 } FNR!=NR && $5 == n { for (i=0;i<=c-1;i++) { split (a[i],b); dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2); if (dist >= 5) printf "%s-%s\t%.2f\n", $NF, b[NF], dist > "/path/to/folder/" FILENAME } }' "$i"{,}; done
答案 1 :(得分:0)
查看Python的MDAnalysis模块 - http://code.google.com/p/mdanalysis/
http://code.google.com/p/mdanalysis/wiki/Examples#Extracting_a_chain_from_a_PDB
答案 2 :(得分:0)
这样的样本数据保存为“test”:
#! /usr/bin/python3.2
import re
class Atom:
def __init__ (self, line):
tokens = re.split (' +', line)
self.chain = tokens [4]
self.x = float (tokens [6] )
self.y = float (tokens [7] )
self.z = float (tokens [8] )
self.name = tokens [11]
def __repr__ (self):
return self.name
def distance (self, other):
return ( (self.x - other.x) ** 2 + (self.y - other.y) ** 2 + (self.z - other.z) ** 2) ** .5
chains = [ [], [] ]
with open ('test', 'r') as pdb:
for line in pdb.readlines ():
atom = Atom (line.strip () )
chains [0 if atom.chain == 'A' else 1].append (atom)
print ('\n'.join ( ['{} - {}\t{}'.format (a, b, a.distance (b) ) for a in chains [0] for b in chains [1] ] ) )
产生类似的东西:
N - C 51.697920905970676
N - O 52.8317143484858
N - N 51.29744063791097
C - C 51.14359109409506
C - O 52.28783920760161
C - N 50.709908814747436
C - C 50.001484907950484
C - O 51.140786403808846
C - N 49.559022700210704
修改强>
我忘记了您的&gt; = 5.0条件:只需将最后一行替换为:
print ('\n'.join ( ['{} - {}\t{}'.format (a, b, a.distance (b) ) for a in chains [0] for b in chains [1] if a.distance (b) >= 5.0] ) )