我有一个看起来像这样的数据文件。
HETATM 1 H10 XSHQ 0 10.139 2.231 0.091 1.00 0.00 H
HETATM 2 N1 XSHQ 0 9.641 1.386 -0.104 1.00 0.00 N
HETATM 3 H9 XSHQ 0 9.773 1.133 -1.063 1.00 0.00 H
HETATM 4 C1 XSHQ 0 8.245 1.531 0.230 1.00 0.00 H
其中XYZ坐标位于第6,7,8列,并且与点相关联的字母位于最后一列。我想确定最后一列中有字母H的那些点之间的距离。我怎样才能做到这一点?我知道这是我需要执行操作的代码,但我对如何使用第6,7和8列中的值感到困惑,并且仅针对最后一列为H的情况:
from scipy.spatial import distance
dst = distance.euclidean(a,b)
答案 0 :(得分:0)
答案 1 :(得分:0)
当然,@ Silencer的答案已经是正确的,使用像OrderedDict
这样的数据结构是一个好主意,但如果你只想使用标准方法,你可以尝试:
from scipy.spatial import distance
# Load data from file
with open('datafile.txt') as datafile:
contents = [line.split() for line in datafile]
# Extract the xyz coordiantes, if there is an H in the last column
coords = []
for i, item in enumerate(contents):
if item[-1] == 'H':
coords.append([[float(x) for x in item[5:8]], i+1])
# Show results
for i in range(len(coords)):
for j in range(i+1, len(coords)):
dist = distance.euclidean(coords[i][0], coords[j][0])
print('({}, {}): {:.5f}'.format(coords[i][1], coords[j][1], dist))
答案 2 :(得分:0)
使用生成器表达式的简单解决方案
来自PEP 289 -- Generator Expressions
的原理强>
列表推导的经验表明它们具有广泛的实用性 整个Python。但是,许多用例不需要在内存中创建完整列表。相反,他们只需要一次迭代一个元素。
,因为
itertools
标准库模块的和combinations
,因为您要计算数据集中每对感兴趣的点的距离。
$ cat euclid.py
from scipy.spatial.distance import euclidean
from itertools import combinations
lines = ['HETATM 1 H10 XSHQ 0 10.139 2.231 0.091 1.00 0.00 H',
'HETATM 2 N1 XSHQ 0 9.641 1.386 -0.104 1.00 0.00 N',
'HETATM 3 H9 XSHQ 0 9.773 1.133 -1.063 1.00 0.00 H',
'HETATM 4 C1 XSHQ 0 8.245 1.531 0.230 1.00 0.00 H']
H_lines = (line for line in lines if line[-1]=='H')
H_lists = (line.split() for line in H_lines)
H_data = ((int(tok[1]), [float(x) for x in tok[5:8]]) for tok in H_lists)
H_dist = {(i[0], j[0]):euclidean(i[1], j[1])
for i, j in combinations(H_data, 2)}
for m, n in H_dist:
print('Distance between points %d and %d is %.6f'%(
m, n, H_dist[m, n]))
$ python3 euclid.py
Distance between points 1 and 3 is 1.634404
Distance between points 1 and 4 is 2.023995
Distance between points 3 and 4 is 2.040842
$