我仍在学习python,并且有一个文件示例:
RDKit 3D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 552 600 0 0 0
M V30 BEGIN ATOM
M V30 1 C 7.3071 41.3785 19.7482 0
M V30 2 C 7.5456 41.3920 21.2703 0
M V30 3 C 8.3653 40.1559 21.6876 0
M V30 4 C 9.7001 40.0714 20.9228 0
M V30 5 C 9.4398 40.0712 19.4042 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 0 1 1 2
M V30 1 1 1 6
M V30 2 1 1 10
M V30 3 1 1 11
M V30 4 1 2 3
M V30 END BOND
M V30 END CTAB
M END
我只想在以下部分之间打印信息:
M V30 BEGIN ATOM
和:
M V30 END ATOM
由于文件之间原子数的变化,我希望可以使用一种通用方法。谁能帮忙吗?
非常感谢。
答案 0 :(得分:2)
您可以尝试以下方法:
# Read file contents
with open("file.txt") as file:
inside = False
for line in file:
# Start section of interest
if line.rstrip() == "M V30 BEGIN ATOM":
inside = True
# End section of interest
elif line.rstrip() == "M V30 END ATOM":
inside = False
# Inside section of interest
elif inside:
print(line.rstrip())
else:
pass
答案 1 :(得分:2)
鉴于试图使逻辑分离简短而巧妙,以及您想要一种可移植方法的事实,
def print_atoms_from_file(full_file_path):
with open(full_file_path, 'r') as f:
start_printing = False
for line in f:
if 'BEGIN ATOM' in line:
start_printing = True
continue
if 'END ATOM' in line:
start_printing = False
continue
if start_printing:
print line
print_atoms_from_file('test_file_name.txt')
答案 2 :(得分:1)
这就是我要使用csv的方式。
def process_file(f):
start_found = False
content = []
with open(f, 'r') as f_in:
reader = csv.reader(f_in, delimiter=' ')
for i, row in enumerate(reader):
if set(['M', 'V30', 'BEGIN', 'ATOM']).issubset(row):
start_found = True
continue
elif set(['M', 'V30', 'END', 'ATOM']).issubset(row):
break
elif start_found:
content.append(row)
return content
答案 3 :(得分:0)
尝试一下:
with open('filename.txt','r') as f:
ok_to_print = False
for line in f.readlines():
line = line.strip # remove whitespaces
if line == 'M V30 BEGIN BOND':
ok_to_print = True
elif line == 'M V30 END ATOM':
ok_to_print = False
else:
if ok_to_print:
print(line)
这将在您读取文件时逐行处理它。对于无法在内存中全部容纳的大文件,这是理想的选择。对于小文件,您可以将整个内容读入内存并使用正则表达式。
import re
data = ''
with open('filename.txt','r') as f:
data = f.read()
a = re.compile('M V30 BEGIN BOND(.+?)M V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
print(result)
注意:此代码均未测试。只是盲目的写。</ p>
答案 4 :(得分:0)
您可以尝试以下功能:
def extract_lines(filename, start_line, stop_line):
lines=[]
with open(filename,'r') as f:
lines=f.readlines()
list_of_lines=[line.rstrip('\n') for line in lines]
start_point=list_of_lines.index(start_line)
stop_point=list_of_lines.index(stop_line)
return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))