如何使用python在两个字符串之间提取列(几乎相同)

时间:2017-12-14 18:14:47

标签: python regex

我有一个非常大的文本文件,其中包含1339018行,我想提取三个部分:

我的FILE.txt

.
.
.
-----------------------
first ATOMIC CHARGES
-----------------------
   0 C :   -0.157853
   1 C :   -0.156875
   2 C :   -0.143714
   3 C :   -0.140489
   4 S :    0.058926
   5 H :    0.128758
   6 H :    0.128814
   7 H :    0.142420
   8 H :    0.140013
My charges :   -0.0000000

------------------------
.
..
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
   0 C :   -0.208137    0.054313
   1 C :   -0.206691    0.053890
   2 C :   -0.266791    0.395830
   3 C :   -0.262729    0.395691
   4 S :   -0.184730    0.179002
   5 H :    0.023341   -0.009535
   6 H :    0.023405   -0.009489
   7 H :    0.042728   -0.029862
   8 H :    0.039605   -0.029841
My charges :   -1.0000000

------------------------
.
.
.
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
   0 C :   -0.086045    0.075562
   1 C :   -0.085256    0.075871
   2 C :    0.022683    0.483590
   3 C :    0.025286    0.483583
   4 S :    0.246328   -0.079498
   5 H :    0.215005   -0.003936
   6 H :    0.215043   -0.003948
   7 H :    0.224379   -0.015598
   8 H :    0.222578   -0.015627
My charges :    1.0000000

------------------------
.
.
.

我尝试使用下面的脚本,以便将第四列提取并转换为列表(例如:

oX = [-0.157853,-0.156875,-0.143714 ...]

oY = [ - 0.208137,-0.206691,...]

oZ = [-0.086045,-0.085256,...]

但不幸的是,第三个循环不起作用。

with open('FILE.txt', 'rb') as f:
     textfile_temp = f.read()
     print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges :   -0.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :   -1.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :    1.0000000")[0]

可能吗?

2 个答案:

答案 0 :(得分:2)

尝试在最后一行进行一次微妙的更改,如下所示。你非常接近!

with open('FILE.txt', 'rb') as f:
     textfile_temp = f.read()
     print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges :   -0.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :   -1.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[2].split("My charges :    1.0000000")[0]
     #                                                          ^ change this

答案 1 :(得分:1)

您可以使用正则表达式提取所需的值:

char

这将打印:

import re

data = []
block = []

with open('input.txt') as f_input:
    for row in f_input:
        values = re.findall('\s+\d+.*?(-?\d+\.\d+)', row)

        if len(values):
            block.append(float(values[0]))
        elif row.startswith('first ATOMIC') and len(block):
            data.append(block)
            block = []

if len(block):
    data.append(block)            

oX, oY, oZ = data    

print oX
print oY
print oZ