Question

我有一个H2S的.xyz文件，如果我这样读取文件：

with open('H2S.xyz','r') as stream:
for line in stream:
    print(line)

我明白了：

3

XYZ file of the hydrogen sulphide molecule

S                  0.00000000    0.00000000    0.10224900

H                  0.00000000    0.96805900   -0.81799200

H                  0.00000000   -0.96805900   -0.81799200

第一行给出原子数，最后三行给出这些原子的坐标。

我应该写一些代码来提取分子中每个原子的位置，形式为列表，其中每个元素都是具有原子坐标的另一个列表。

如果我这样做：

with open('H2S.xyz','r') as stream:
new=list(stream)
new

我将每一行作为列表中的元素，如果这样做，

with open('H2S.xyz','r') as stream:
new_list=[]
for line in stream:
    new_list=new_list+line.split()
new_list

我分别得到每个元素：

['3','XYZ','file','of','the','hydrogen','sulphide','molecule','S',
'0.00000000','0.00000000','0.10224900','H','0.00000000','0.96805900',
'-0.81799200','H','0.00000000','-0.96805900','-0.81799200']

我不想要。我想要的列表如下所示：

[['0.00000000','0.00000000','0.10224900'],
['0.00000000','0.96805900','-0.81799200'],
['0.00000000','-0.96805900','-0.81799200']]

但是我不确定如何为此编写代码。

Answer 1

此功能应为您提供正确的输出。

def parse_xyz(file_name):

    output = []
    with open(file_name) as infile:
        data = infile.readlines()
        for row in data[2:]: # Throw away the first few lines
            if row[1:]: # Throw away the first column
                output.append(row[1:].split())
    return output


result = parse_xyz('h2s.xyz')
print(result)

有关其功能的一些说明：

首先，我将代码包装在一个函数中。通常首选这种方式，因为这意味着您可以使用其他文件重复该过程，例如result = parse_xyz('h2o.xyz')
for row in data[2:]:是list slicing，因此我们不会从少数几行开始捕获任何结果。
我们在嵌套的for循环中重复切片符号，这等同于丢弃要记录的行的第一个字符。

Answer 2

我会做类似的事情：

import re
with open("file.txt", "r") as f: 
    print([re.split(r"\s+", x.strip(), 3) for x in f if len(re.split(r"\s+", x, 3)) == 4])

[['S', '0.00000000', '0.00000000', '0.10224900'], ['H', '0.00000000', '0.96805900', '-0.81799200'], ['H', '0.00000000', '-0.96805900', '-0.81799200']]

Answer 3

读取.xyz文件的所有行，拆分元素和位置，并将位置附加到列表中。

H2S.xyz

    3
XYZ file of the hydrogen sulphide molecule
    S       0.00000000      0.00000000      0.10224900
    H       0.00000000      0.96805900     -0.81799200
    H       0.00000000     -0.96805900     -0.81799200

代码

with open('H2S.xyz') as data:
    lines=data.readlines()                  # read all lines
    new_list = []
    for atom in lines[2:]:                  # start from third line
        position = atom.split()             # get the values
        new_list.append(position[1:])       # append only the the positions

print(new_list)

您的列表

[['0.00000000', '0.00000000', '0.10224900'],
['0.00000000', '0.96805900', '-0.81799200'],
['0.00000000', '-0.96805900', '-0.81799200']]

从列表中提取职位（Python）

3 个答案: