Question

我有一个像这样的文本文件：-

-----Starting Step for step1-----
text1
text2
text3
-----Ending Step for step1-----
-----Starting Step for step2-----
text4
text5
text6
-----Starting Step for step3-----
text7
text8
text9
-----Ending Step for step3-----

text10
text11
text12
-----Ending Step for step2-----

打开日志文件后，我一直在尝试搜索启动步骤及其相应内容的模式。

我将其保存在变量值中，现在，如果在获得模式结束步骤之前获得另一个“开始步骤”模式，则将其视为较早的父级的孩子。

with open('C:\Python27\sample.log','r') as f:
    with tag('html'):
        with tag('body'):
            with tag('pre'):
                for line in f:
                        value=re.findall(r'Starting Step for (\w+)',line)
                        new_value=re.findall(r'Ending Step for (\w+)',line)
                        if value not in parent_tag_stop and value not in parent_tag_start:
                            if parent_tag_start:
                                parent_tag_start.append(value)
                            else:
                               child_tag[parent_tag_start[-1]] =value

                        elif new_value:
                                parent_tag_stop.append(value)
                                if tag==new_value[0]:
                                    with tag('a', href='#{0}'.format(new_value)):
                                        text(value)
                                    value=''
                        else:
                            value+=line

我想将每个块从开始步骤拆分到结束步骤，并创建一个以step1，step2等作为锚标记并将相应内容作为其文本的html页面，此处的step3将是步骤2下的子锚，其内容将是也是step2的一部分

Answer 1

我不确定这是否是正确的输出结构，但是我提供了以下解决方案：

from bs4 import BeautifulSoup
import re
from pprint import pprint

data = '''-----Starting Step for step1-----
text1
text2
text3
-----Ending Step for step1-----
-----Starting Step for step2-----
text4
text5
text6
-----Starting Step for step3-----
text7
text8
text9
-----Ending Step for step3-----

text10
text11
text12
-----Ending Step for step2-----'''


data = re.sub(r'-----Starting Step for step(\d+)-----', r'<a href="#step\1" /><div>', data)
data = re.sub(r'-----Ending Step for step\d+-----', r'</div>', data)

soup = BeautifulSoup(data, 'lxml')
print(soup.prettify())

打印：

<html>
 <body>
  <a href="#step1">
  </a>
  <div>
   text1
text2
text3
  </div>
  <a href="#step2">
  </a>
  <div>
   text4
text5
text6
   <a href="#step3">
   </a>
   <div>
    text7
text8
text9
   </div>
   text10
text11
text12
  </div>
 </body>
</html>

使用分隔符分割文字并建立html页面

1 个答案: