从文件中打印文本的一部分

时间:2018-10-19 19:02:36

标签: python

我仍在学习python,并且有一个文件示例:

 RDKit          3D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 552 600 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 7.3071 41.3785 19.7482 0
M  V30 2 C 7.5456 41.3920 21.2703 0
M  V30 3 C 8.3653 40.1559 21.6876 0
M  V30 4 C 9.7001 40.0714 20.9228 0
M  V30 5 C 9.4398 40.0712 19.4042 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 0 1 1 2
M  V30 1 1 1 6
M  V30 2 1 1 10
M  V30 3 1 1 11
M  V30 4 1 2 3
M  V30 END BOND
M  V30 END CTAB
M  END

我只想在以下部分之间打印信息:

M  V30 BEGIN ATOM

和:

M  V30 END ATOM

由于文件之间原子数的变化,我希望可以使用一种通用方法。谁能帮忙吗?

非常感谢。

5 个答案:

答案 0 :(得分:2)

您可以尝试以下方法:

# Read file contents
with open("file.txt") as file:
    inside = False
    for line in file:
        # Start section of interest
        if line.rstrip() == "M  V30 BEGIN ATOM":
            inside = True
        # End section of interest
        elif line.rstrip() == "M  V30 END ATOM":
            inside = False
        # Inside section of interest
        elif inside:
            print(line.rstrip())
        else:
            pass

答案 1 :(得分:2)

鉴于试图使逻辑分离简短而巧妙,以及您想要一种可移植方法的事实,

def print_atoms_from_file(full_file_path):
    with open(full_file_path, 'r') as f:
        start_printing = False
        for line in f:

            if 'BEGIN ATOM' in line:
                start_printing = True
                continue

            if 'END ATOM' in line:
                start_printing = False
                continue

            if start_printing:
                print line

print_atoms_from_file('test_file_name.txt')

答案 2 :(得分:1)

这就是我要使用csv的方式。

def process_file(f):
    start_found = False
    content = []
    with open(f, 'r') as f_in:
        reader = csv.reader(f_in, delimiter=' ')
        for i, row in enumerate(reader):
            if set(['M', 'V30', 'BEGIN', 'ATOM']).issubset(row):
                start_found = True
                continue
            elif set(['M', 'V30', 'END', 'ATOM']).issubset(row):
                break
            elif start_found:
                content.append(row)
    return content

答案 3 :(得分:0)

尝试一下:

with open('filename.txt','r') as f:
    ok_to_print = False
    for line in f.readlines():
        line = line.strip # remove whitespaces
        if line == 'M  V30 BEGIN BOND':
            ok_to_print = True
        elif line == 'M  V30 END ATOM':
            ok_to_print = False
        else:
            if ok_to_print:
                print(line)

这将在您读取文件时逐行处理它。对于无法在内存中全部容纳的大文件,这是理想的选择。对于小文件,您可以将整个内容读入内存并使用正则表达式。

import re
data = ''
with open('filename.txt','r') as f:
    data = f.read()
a = re.compile('M  V30 BEGIN BOND(.+?)M  V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
  print(result)

注意:此代码均未测试。只是盲目的写。<​​/ p>

答案 4 :(得分:0)

您可以尝试以下功能:

def extract_lines(filename, start_line, stop_line):
    lines=[]
    with open(filename,'r') as f:
        lines=f.readlines()

    list_of_lines=[line.rstrip('\n') for line in lines]

    start_point=list_of_lines.index(start_line)
    stop_point=list_of_lines.index(stop_line)

    return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))