我有一个非常大的文本文件,其中包含1339018行,我想提取三个部分:
我的FILE.txt
.
.
.
-----------------------
first ATOMIC CHARGES
-----------------------
0 C : -0.157853
1 C : -0.156875
2 C : -0.143714
3 C : -0.140489
4 S : 0.058926
5 H : 0.128758
6 H : 0.128814
7 H : 0.142420
8 H : 0.140013
My charges : -0.0000000
------------------------
.
..
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
0 C : -0.208137 0.054313
1 C : -0.206691 0.053890
2 C : -0.266791 0.395830
3 C : -0.262729 0.395691
4 S : -0.184730 0.179002
5 H : 0.023341 -0.009535
6 H : 0.023405 -0.009489
7 H : 0.042728 -0.029862
8 H : 0.039605 -0.029841
My charges : -1.0000000
------------------------
.
.
.
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
0 C : -0.086045 0.075562
1 C : -0.085256 0.075871
2 C : 0.022683 0.483590
3 C : 0.025286 0.483583
4 S : 0.246328 -0.079498
5 H : 0.215005 -0.003936
6 H : 0.215043 -0.003948
7 H : 0.224379 -0.015598
8 H : 0.222578 -0.015627
My charges : 1.0000000
------------------------
.
.
.
我尝试使用下面的脚本,以便将第四列提取并转换为列表(例如:
oX = [-0.157853,-0.156875,-0.143714 ...]
oY = [ - 0.208137,-0.206691,...]
oZ = [-0.086045,-0.085256,...]
但不幸的是,第三个循环不起作用。
with open('FILE.txt', 'rb') as f:
textfile_temp = f.read()
print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges : -0.0000000")[0]
print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges : -1.0000000")[0]
print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges : 1.0000000")[0]
可能吗?
答案 0 :(得分:2)
尝试在最后一行进行一次微妙的更改,如下所示。你非常接近!
with open('FILE.txt', 'rb') as f:
textfile_temp = f.read()
print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges : -0.0000000")[0]
print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges : -1.0000000")[0]
print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[2].split("My charges : 1.0000000")[0]
# ^ change this
答案 1 :(得分:1)
您可以使用正则表达式提取所需的值:
char
这将打印:
import re
data = []
block = []
with open('input.txt') as f_input:
for row in f_input:
values = re.findall('\s+\d+.*?(-?\d+\.\d+)', row)
if len(values):
block.append(float(values[0]))
elif row.startswith('first ATOMIC') and len(block):
data.append(block)
block = []
if len(block):
data.append(block)
oX, oY, oZ = data
print oX
print oY
print oZ