我陷入了一个相当简单的问题,在网上搜索后无法弄清楚。我有一个文件,里面有不同化学结构的xyz坐标。我必须读取文件并分离所有坐标并将它们保存在列表列表中,其中一个结构将保存为列表列表。
以下是文件内容:
4
C:\Users\i4has\Desktop\Test\xa01.pdb
C 11.74100 -0.16400 11.81700
H 10.89900 -0.07300 11.12700
H 12.06500 0.84600 12.09300
H 11.37900 -0.66000 12.72200
4
C:\Users\i4has\Desktop\Test\xa01.pdb
C 10.85300 -0.88000 9.22400
H 10.72700 0.08200 8.72200
H 10.13800 -0.92800 10.05300
H 10.59300 -1.66500 8.51000
4
C:\Users\i4has\Desktop\Test\xa01.pdb
C 11.24200 -2.12500 9.34300
H 10.31400 -1.67400 8.98400
H 11.00100 -2.76200 10.20000
H 11.63100 -2.76500 8.54700
4
C:\Users\i4has\Desktop\Test\xa01.pdb
C 10.27500 -0.28000 10.38600
H 10.06700 0.36400 11.24300
H 9.67000 -1.18700 10.48700
H 9.94500 0.24600 9.48600
4
C:\Users\i4has\Desktop\Test\xa01.pdb
C 11.30600 1.51100 7.15800
H 11.68900 1.85800 6.19600
H 11.53100 2.27500 7.91000
H 10.21900 1.43100 7.07500
我也不希望每个结构坐标的前两行。这是我试过的代码:
input_file = 'vega_str.xyz'
open_file = open(input_file, 'r')
first_lett = str(open_file.readline())
print(first_lett)
conformers = []
geom = []
f = open(input_file, 'r')
for line in f:
if line.find(first_lett) == 1:
del geom[:]
readStructure = f.__next__()
while True:
readStructure = f.__next__()
if readStructure.find(first_lett) == -1:
readStructure = readStructure.split()
geom.append(readStructure)
else:
break
for i in geom:
del i[0:3:2]
conformers.append(geom)
我希望输出如下:
conformers = [[['C', 11.74100, -0.16400, 11.81700], ['H, 10.89900, -0.07300,
11.12700], ['H, 12.06500, 0.84600, 12.09300], ['H', 11.37900, -0.66000,
12.72200], [['C', 10.85300, -0.88000, 9.22400], ['H, 10.72700, 0.08200,
8.72200], ['H, 10.13800, -0.92800, 10.05300], ['H', 10.59300, -1.66500,
8.51000]]....]
请在这里帮忙。我真的很感激。
答案 0 :(得分:0)
像
这样的东西def file_formatter(file):
with open(file) as file:
return [list(
filter(lambda i: i if i else '',
line.strip().split(' '))
) for line in file.readlines()]
file_names = ['test_file', ]
print([file_formatter(file) for file in file_names])
[[['C', '11.30600', '1.51100', '7.15800'], ['H', '11.68900', '1.85800', '6.19600'], ['H', '11.53100', '2.27500', '7.91000'], ['H', '10.21900', '1.43100', '7.07500']]]
答案 1 :(得分:0)
如果数字是指定要列出的原子数,那么你可以试着看看盲目信任会引导你的地方:
In [19]: import io
In [20]: from itertools import islice
In [22]: source = r"""4
...: C:\Users\i4has\Desktop\Test\xa01.pdb
...: C 11.74100 -0.16400 11.81700
...: H 10.89900 -0.07300 11.12700
...: H 12.06500 0.84600 12.09300
...: H 11.37900 -0.66000 12.72200
...: 4
...: C:\Users\i4has\Desktop\Test\xa01.pdb
...: C 10.85300 -0.88000 9.22400
...: H 10.72700 0.08200 8.72200
...: H 10.13800 -0.92800 10.05300
...: H 10.59300 -1.66500 8.51000
...: 4
...: C:\Users\i4has\Desktop\Test\xa01.pdb
...: C 11.24200 -2.12500 9.34300
...: H 10.31400 -1.67400 8.98400
...: H 11.00100 -2.76200 10.20000
...: H 11.63100 -2.76500 8.54700
...: 4
...: C:\Users\i4has\Desktop\Test\xa01.pdb
...: C 10.27500 -0.28000 10.38600
...: H 10.06700 0.36400 11.24300
...: H 9.67000 -1.18700 10.48700
...: H 9.94500 0.24600 9.48600
...: 4
...: C:\Users\i4has\Desktop\Test\xa01.pdb
...: C 11.30600 1.51100 7.15800
...: H 11.68900 1.85800 6.19600
...: H 11.53100 2.27500 7.91000
...: H 10.21900 1.43100 7.07500 """
注意itertools.islice
采用与常规切片相同的参数:start,stop和step。现在假装我的字符串缓冲区(io.StringIO
)是一个文件对象,你应该将它包装在一个with块中:
In [23]: with io.StringIO(source) as f:
...: file_it = iter(f)
...: result = []
...: for line in file_it:
...: sub = [l.split() for l in islice(file_it, 1, int(line[0]) + 1)]
...: result.append(sub)
...:
In [24]: result
Out[24]:
[[['C', '11.74100', '-0.16400', '11.81700'],
['H', '10.89900', '-0.07300', '11.12700'],
['H', '12.06500', '0.84600', '12.09300'],
['H', '11.37900', '-0.66000', '12.72200']],
[['C', '10.85300', '-0.88000', '9.22400'],
['H', '10.72700', '0.08200', '8.72200'],
['H', '10.13800', '-0.92800', '10.05300'],
['H', '10.59300', '-1.66500', '8.51000']],
[['C', '11.24200', '-2.12500', '9.34300'],
['H', '10.31400', '-1.67400', '8.98400'],
['H', '11.00100', '-2.76200', '10.20000'],
['H', '11.63100', '-2.76500', '8.54700']],
[['C', '10.27500', '-0.28000', '10.38600'],
['H', '10.06700', '0.36400', '11.24300'],
['H', '9.67000', '-1.18700', '10.48700'],
['H', '9.94500', '0.24600', '9.48600']],
[['C', '11.30600', '1.51100', '7.15800'],
['H', '11.68900', '1.85800', '6.19600'],
['H', '11.53100', '2.27500', '7.91000'],
['H', '10.21900', '1.43100', '7.07500']]]
答案 2 :(得分:0)
我是python的初学者,我建议:
lines=open("'vega_str.xyz'","r").readlines()
innerlist=[]
conformers=[]
for line in lines:
line=line.strip()
if line.startswith("4") or line.startswith("C:\Users"):
continue
coordinates=[x for x in line.split(" ") if x!=""]
if coordinates[0] == "C" and innerlist:
conformers.append(innerlist)
innerlist=[]
innerlist.append(coordinates)
print conformers
输出:
[[['C', '11.74100', '-0.16400', '11.81700'], ['H', '10.89900', '-0.07300', '11.12700'], ['H', '12.06500', '0.84600', '12.09300'], ['H', '11.37900', '-0.66000', '12.72200']], [['C', '10.85300', '-0.88000', '9.22400'], ['H', '10.72700', '0.08200', '8.72200'], ['H', '10.13800', '-0.92800', '10.05300'], ['H', '10.59300', '-1.66500', '8.51000']], [['C', '11.24200', '-2.12500', '9.34300'], ['H', '10.31400', '-1.67400', '8.98400'], ['H', '11.00100', '-2.76200', '10.20000'], ['H', '11.63100', '-2.76500', '8.54700']], [['C', '10.27500', '-0.28000', '10.38600'], ['H', '10.06700', '0.36400', '11.24300'], ['H', '9.67000', '-1.18700', '10.48700'], ['H', '9.94500', '0.24600', '9.48600']]]