我有一个明文文件,我想分成多个文件。文件格式如下:
-----BEGIN CERTIFICATE-----
text1
text2
text3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text4
text5
text6
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text7
text8
text9
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text10
text11
text12
-----END CERTIFICATE-----
我想将每个块从(包括)BEGIN拆分为(包括)END。
这是我到目前为止所写的:
with open('/Users/arl/Downloads/bundle.pem', 'r') as cert_file:
cert = cert_file.readlines()
def parse_file(filename=None, variable=None):
with open(filename, "w") as variable:
for line in cert:
if "BEGIN" in line:
variable.write(line)
continue
elif "END" in line:
variable.write(line)
parse_file(filename="int1.pem", variable="int1_file")
parse_file(filename="int2.pem", variable="int2_file")
parse_file(filename="end.pem", variable="end_file")
print line.rstrip()
variable.write(line)
variable.close()
parse_file(filename="root.pem", variable="root_file")
我目前得到的错误:
parse_file(filename="int1.pem", variable="int1_file")
File "splitter.py", line 12, in parse_file
parse_file(filename="int1.pem", variable="int1_file")
File "splitter.py", line 17, in parse_file
variable.close()
RuntimeError: maximum recursion depth exceeded while calling a Python object
只写root.pem
和int1.pem
(两者都有相同的内容,不应该这样)
为了解析文件并将每个新块写入新文件,我需要做什么?在循环中哪个是函数用新参数调用自身的正确点?
由于
答案 0 :(得分:1)
通过正则表达式:
import re
content = '''
-----BEGIN CERTIFICATE-----
text1
text2
text3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text4
text5
text6
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text7
text8
text9
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text10
text11
text12
-----END CERTIFICATE-----
'''
content = content.strip('\n')
pattern = re.compile('\-\-\-\-\-BEGIN CERTIFICATE\-\-\-\-\-((.|\n)*?)\-\-\-\-\-END CERTIFICATE\-\-\-\-\-')
certs = re.findall(pattern, content)
for cert in certs:
cert_content = cert[0].strip('\n')
print cert_content
print
答案 1 :(得分:1)
我无法看到递归在这里有用 - 而是你可以创建一个输出文件名列表并使用iter
和next
进行迭代,以便在遇到时打开文件“BEGIN”,然后在遇到“END”时关闭相同的文件。
def parse_file(input_file, output_files):
filenames = iter(output_files)
with open(input_file, 'r') as cert_file:
for line in cert_file:
if "BEGIN" in line:
output = open(filenames.next(), 'w')
output.write(line)
if "END" in line:
output.close()
output.close() # just in case not already closed
input_file = '/Users/arl/Downloads/bundle.pem'
output_files = ['root.pem', 'int1.pem', 'int2.pem', 'end.pem']
parse_file(input_file=input_file, output_files=output_files)
如果“BEGIN”和“END”之间有任何空格或其他内容,则会引发错误。如果这是一个问题,您可以添加一行来检查输出文件是否已打开。
def parse_file(input_file, output_files):
filenames = iter(output_files)
output = None
with open(input_file, 'r') as cert_file:
for line in cert_file:
if "BEGIN" in line:
output = open(filenames.next(), 'w')
if output and not output.closed:
output.write(line)
if "END" in line:
output.close()
output.close()
或等效地,使用嵌套循环:
def parse_file(input_file, output_files):
output = None
with open(input_file, 'r') as cert_file:
for output_file in output_files:
for line in cert_file:
if "BEGIN" in line:
output = open(output_file, 'w')
if output and not output.closed:
output.write(line)
if "END" in line:
output.close()
break # breaks out of inner loop and gets next output_file
output.close()
答案 2 :(得分:0)
与其他答案类似,但允许BEGIN和END之间有更多段,而无需手动列出文件名。该脚本重命名它输出的最终文件。正如其他人所提到的,不需要递归。 (递归会让你疯了。)
collect = False
file_number = -1
with open('big_file.txt') as big:
for line in big.readlines():
if line.startswith('-----BEGIN'):
collect = True
file_number += 1
little = open('int%s.pem' % file_number, 'w')
continue
elif line.startswith('-----END'):
little.close()
collect = False
else:
little.write(line)
import os
os.rename('int%s.pem' % file_number, 'end.pem')