使用递归在python中基于分隔符拆分文本文件

时间:2017-06-16 19:41:35

标签: python recursion

我有一个明文文件,我想分成多个文件。文件格式如下:

-----BEGIN CERTIFICATE-----
text1
text2
text3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text4
text5
text6
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text7
text8
text9
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text10
text11
text12
-----END CERTIFICATE-----

我想将每个块从(包括)BEGIN拆分为(包括)END。

这是我到目前为止所写的:

with open('/Users/arl/Downloads/bundle.pem', 'r') as cert_file:
    cert = cert_file.readlines()

def parse_file(filename=None, variable=None):
    with open(filename, "w") as variable:
        for line in cert:
            if "BEGIN" in line:
                variable.write(line)
                continue
            elif "END" in line:
                variable.write(line)
                parse_file(filename="int1.pem", variable="int1_file")
                parse_file(filename="int2.pem", variable="int2_file")
                parse_file(filename="end.pem", variable="end_file")
            print line.rstrip()
            variable.write(line)
        variable.close()

parse_file(filename="root.pem", variable="root_file")

我目前得到的错误:

    parse_file(filename="int1.pem", variable="int1_file")
  File "splitter.py", line 12, in parse_file
    parse_file(filename="int1.pem", variable="int1_file")
  File "splitter.py", line 17, in parse_file
    variable.close()
RuntimeError: maximum recursion depth exceeded while calling a Python object

只写root.pemint1.pem(两者都有相同的内容,不应该这样)

为了解析文件并将每个新块写入新文件,我需要做什么?在循环中哪个是函数用新参数调用自身的正确点?

由于

3 个答案:

答案 0 :(得分:1)

通过正则表达式:

import re

content = '''
-----BEGIN CERTIFICATE-----
text1
text2
text3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text4
text5
text6
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text7
text8
text9
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
text10
text11
text12
-----END CERTIFICATE-----
'''

content = content.strip('\n')

pattern = re.compile('\-\-\-\-\-BEGIN CERTIFICATE\-\-\-\-\-((.|\n)*?)\-\-\-\-\-END CERTIFICATE\-\-\-\-\-')
certs = re.findall(pattern, content)
for cert in certs:  
    cert_content = cert[0].strip('\n')
    print cert_content
    print

答案 1 :(得分:1)

我无法看到递归在这里有用 - 而是你可以创建一个输出文件名列表并使用iternext进行迭代,以便在遇到时打开文件“BEGIN”,然后在遇到“END”时关闭相同的文件。

def parse_file(input_file, output_files):
    filenames = iter(output_files)
    with open(input_file, 'r') as cert_file:
        for line in cert_file:
            if "BEGIN" in line:
                output = open(filenames.next(), 'w')
            output.write(line)
            if "END" in line:
                output.close()
    output.close() # just in case not already closed

input_file = '/Users/arl/Downloads/bundle.pem'
output_files = ['root.pem', 'int1.pem', 'int2.pem', 'end.pem']
parse_file(input_file=input_file, output_files=output_files)

如果“BEGIN”和“END”之间有任何空格或其他内容,则会引发错误。如果这是一个问题,您可以添加一行来检查输出文件是否已打开。

def parse_file(input_file, output_files):
    filenames = iter(output_files)
    output = None
    with open(input_file, 'r') as cert_file:
        for line in cert_file:
            if "BEGIN" in line:
                output = open(filenames.next(), 'w')
            if output and not output.closed:
                output.write(line)
            if "END" in line:
                output.close()
    output.close()

或等效地,使用嵌套循环:

def parse_file(input_file, output_files):
    output = None
    with open(input_file, 'r') as cert_file:
        for output_file in output_files:
            for line in cert_file:
                if "BEGIN" in line:
                    output = open(output_file, 'w')
                if output and not output.closed:
                    output.write(line)
                if "END" in line:
                    output.close()
                    break  # breaks out of inner loop and gets next output_file
    output.close()

答案 2 :(得分:0)

与其他答案类似,但允许BEGIN和END之间有更多段,而无需手动列出文件名。该脚本重命名它输出的最终文件。正如其他人所提到的,不需要递归。 (递归会让你疯了。)

collect = False
file_number = -1
with open('big_file.txt') as big:
    for line in big.readlines():
        if line.startswith('-----BEGIN'):
            collect = True
            file_number += 1
            little = open('int%s.pem' % file_number, 'w')
            continue
        elif line.startswith('-----END'):
            little.close()
            collect = False
        else:
            little.write(line)

import os
os.rename('int%s.pem' % file_number, 'end.pem')