Question

我有一个看起来像这样的数据文件。

HETATM    1  H10 XSHQ    0      10.139   2.231   0.091  1.00  0.00           H
HETATM    2   N1 XSHQ    0       9.641   1.386  -0.104  1.00  0.00           N
HETATM    3   H9 XSHQ    0       9.773   1.133  -1.063  1.00  0.00           H
HETATM    4   C1 XSHQ    0       8.245   1.531   0.230  1.00  0.00           H

其中XYZ坐标位于第6,7,8列，并且与点相关联的字母位于最后一列。我想确定最后一列中有字母H的那些点之间的距离。我怎样才能做到这一点？我知道这是我需要执行操作的代码，但我对如何使用第6,7和8列中的值感到困惑，并且仅针对最后一列为H的情况：

from scipy.spatial import distance    
dst = distance.euclidean(a,b)

Answer 1

我使用regexp提取日期，然后按规则过滤它们。

演示代码是这样的：

Answer 2

当然，@ Silencer的答案已经是正确的，使用像OrderedDict这样的数据结构是一个好主意，但如果你只想使用标准方法，你可以尝试：

from scipy.spatial import distance

# Load data from file
with open('datafile.txt') as datafile: 
    contents = [line.split() for line in datafile]

# Extract the xyz coordiantes, if there is an H in the last column
coords = []
for i, item in enumerate(contents):
    if item[-1] == 'H':
        coords.append([[float(x) for x in item[5:8]], i+1])

# Show results
for i in range(len(coords)):
    for j in range(i+1, len(coords)):
        dist = distance.euclidean(coords[i][0], coords[j][0])
        print('({}, {}): {:.5f}'.format(coords[i][1], coords[j][1], dist))

Answer 3

使用生成器表达式的简单解决方案

来自PEP 289 -- Generator Expressions
   的原理
   列表推导的经验表明它们具有广泛的实用性    整个Python。但是，许多用例不需要在内存中创建完整列表。相反，他们只需要一次迭代一个元素。

，因为

您无需保存中间结果，

可能你有一个大数据集
来自itertools标准库模块的
和combinations，因为您要计算数据集中每对感兴趣的点的距离。

$ cat euclid.py from scipy.spatial.distance import euclidean from itertools import combinations lines = ['HETATM 1 H10 XSHQ 0 10.139 2.231 0.091 1.00 0.00 H', 'HETATM 2 N1 XSHQ 0 9.641 1.386 -0.104 1.00 0.00 N', 'HETATM 3 H9 XSHQ 0 9.773 1.133 -1.063 1.00 0.00 H', 'HETATM 4 C1 XSHQ 0 8.245 1.531 0.230 1.00 0.00 H'] H_lines = (line for line in lines if line[-1]=='H') H_lists = (line.split() for line in H_lines) H_data = ((int(tok[1]), [float(x) for x in tok[5:8]]) for tok in H_lists) H_dist = {(i[0], j[0]):euclidean(i[1], j[1]) for i, j in combinations(H_data, 2)} for m, n in H_dist: print('Distance between points %d and %d is %.6f'%( m, n, H_dist[m, n])) $ python3 euclid.py Distance between points 1 and 3 is 1.634404 Distance between points 1 and 4 is 2.023995 Distance between points 3 and 4 is 2.040842 $

欧洲距离python中xyz坐标的距离

3 个答案: