Question

我仍在学习python，并且有一个文件示例：

 RDKit          3D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 552 600 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 7.3071 41.3785 19.7482 0
M  V30 2 C 7.5456 41.3920 21.2703 0
M  V30 3 C 8.3653 40.1559 21.6876 0
M  V30 4 C 9.7001 40.0714 20.9228 0
M  V30 5 C 9.4398 40.0712 19.4042 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 0 1 1 2
M  V30 1 1 1 6
M  V30 2 1 1 10
M  V30 3 1 1 11
M  V30 4 1 2 3
M  V30 END BOND
M  V30 END CTAB
M  END

我只想在以下部分之间打印信息：

M  V30 BEGIN ATOM

和：

M  V30 END ATOM

由于文件之间原子数的变化，我希望可以使用一种通用方法。谁能帮忙吗？

非常感谢。

Answer 1

您可以尝试以下方法：

# Read file contents
with open("file.txt") as file:
    inside = False
    for line in file:
        # Start section of interest
        if line.rstrip() == "M  V30 BEGIN ATOM":
            inside = True
        # End section of interest
        elif line.rstrip() == "M  V30 END ATOM":
            inside = False
        # Inside section of interest
        elif inside:
            print(line.rstrip())
        else:
            pass

Answer 2

鉴于试图使逻辑分离简短而巧妙，以及您想要一种可移植方法的事实，

def print_atoms_from_file(full_file_path):
    with open(full_file_path, 'r') as f:
        start_printing = False
        for line in f:

            if 'BEGIN ATOM' in line:
                start_printing = True
                continue

            if 'END ATOM' in line:
                start_printing = False
                continue

            if start_printing:
                print line

print_atoms_from_file('test_file_name.txt')

Answer 3

这就是我要使用csv的方式。

def process_file(f):
    start_found = False
    content = []
    with open(f, 'r') as f_in:
        reader = csv.reader(f_in, delimiter=' ')
        for i, row in enumerate(reader):
            if set(['M', 'V30', 'BEGIN', 'ATOM']).issubset(row):
                start_found = True
                continue
            elif set(['M', 'V30', 'END', 'ATOM']).issubset(row):
                break
            elif start_found:
                content.append(row)
    return content

Answer 4

尝试一下：

with open('filename.txt','r') as f:
    ok_to_print = False
    for line in f.readlines():
        line = line.strip # remove whitespaces
        if line == 'M  V30 BEGIN BOND':
            ok_to_print = True
        elif line == 'M  V30 END ATOM':
            ok_to_print = False
        else:
            if ok_to_print:
                print(line)

这将在您读取文件时逐行处理它。对于无法在内存中全部容纳的大文件，这是理想的选择。对于小文件，您可以将整个内容读入内存并使用正则表达式。

import re
data = ''
with open('filename.txt','r') as f:
    data = f.read()
a = re.compile('M  V30 BEGIN BOND(.+?)M  V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
  print(result)

注意：此代码均未测试。只是盲目的写。</ p>

Answer 5

您可以尝试以下功能：

def extract_lines(filename, start_line, stop_line):
    lines=[]
    with open(filename,'r') as f:
        lines=f.readlines()

    list_of_lines=[line.rstrip('\n') for line in lines]

    start_point=list_of_lines.index(start_line)
    stop_point=list_of_lines.index(stop_line)

    return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))

从文件中打印文本的一部分

5 个答案: