Question

我有一个非常大的文本文件，其中包含1339018行，我想提取三个部分：

我的FILE.txt

.
.
.
-----------------------
first ATOMIC CHARGES
-----------------------
   0 C :   -0.157853
   1 C :   -0.156875
   2 C :   -0.143714
   3 C :   -0.140489
   4 S :    0.058926
   5 H :    0.128758
   6 H :    0.128814
   7 H :    0.142420
   8 H :    0.140013
My charges :   -0.0000000

------------------------
.
..
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
   0 C :   -0.208137    0.054313
   1 C :   -0.206691    0.053890
   2 C :   -0.266791    0.395830
   3 C :   -0.262729    0.395691
   4 S :   -0.184730    0.179002
   5 H :    0.023341   -0.009535
   6 H :    0.023405   -0.009489
   7 H :    0.042728   -0.029862
   8 H :    0.039605   -0.029841
My charges :   -1.0000000

------------------------
.
.
.
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
   0 C :   -0.086045    0.075562
   1 C :   -0.085256    0.075871
   2 C :    0.022683    0.483590
   3 C :    0.025286    0.483583
   4 S :    0.246328   -0.079498
   5 H :    0.215005   -0.003936
   6 H :    0.215043   -0.003948
   7 H :    0.224379   -0.015598
   8 H :    0.222578   -0.015627
My charges :    1.0000000

------------------------
.
.
.

我尝试使用下面的脚本，以便将第四列提取并转换为列表（例如：

oX = [-0.157853，-0.156875，-0.143714 ...]

oY = [ - 0.208137，-0.206691，...]

oZ = [-0.086045，-0.085256，...]

但不幸的是，第三个循环不起作用。

with open('FILE.txt', 'rb') as f:
     textfile_temp = f.read()
     print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges :   -0.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :   -1.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :    1.0000000")[0]

可能吗？

Answer 1

尝试在最后一行进行一次微妙的更改，如下所示。你非常接近！

with open('FILE.txt', 'rb') as f:
     textfile_temp = f.read()
     print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges :   -0.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :   -1.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[2].split("My charges :    1.0000000")[0]
     #                                                          ^ change this

Answer 2

您可以使用正则表达式提取所需的值：

char

这将打印：

import re

data = []
block = []

with open('input.txt') as f_input:
    for row in f_input:
        values = re.findall('\s+\d+.*?(-?\d+\.\d+)', row)

        if len(values):
            block.append(float(values[0]))
        elif row.startswith('first ATOMIC') and len(block):
            data.append(block)
            block = []

if len(block):
    data.append(block)            

oX, oY, oZ = data    

print oX
print oY
print oZ

如何使用python在两个字符串之间提取列（几乎相同）

2 个答案: