Python Regex Mark-Up

时间:2013-05-08 00:49:31

标签: python regex

嗨,大家好遇到特定问题。 我正在使用python的正则表达式来改变标记源以输出html格式。

标记来源:

[ 
# sometextsometextsometextsometextsometextsometext.  #

# sometextsometextsometextsometextsometextsometextsometextsometext
sometextsometextsometextsometextsometextsometext. #
]


[
hello i am a normal paragraph.
]

期望的输出:

<ol> 
<li> sometextsometextsometextsometextsometextsometext.  </li>

<li> sometextsometextsometextsometextsometextsometextsometextsometext
sometextsometextsometextsometextsometextsometext. </li>
</ol>

<p>
hello i am a normal paragraph.
</p>

1 个答案:

答案 0 :(得分:1)

import re
with open('mk.txt') as f:
    with open('newmk.txt','w+') as g:
        text = f.read()
        SquareGroups = re.findall(r'\[(?:.|\n)+?\]',text)
        for group in SquareGroups:
            if '#' in group: #must be ol
                group = group.replace('[','<ol>')
                group = group.replace(']','</ol>')
                group = re.sub('#(?= ?\w)','<li>',group)
                group = re.sub('(?<=[\w ])#','</li>',group)
            else:
                group = group.replace('[','<p>')
                group = group.replace(']','</p>')
            g.write(group)
            g.write('\n') #optional, just makes the output look 'nicer'

mk.txt中的输入转换为newmk.txt中的以下文字:

<ol>
<li> sometextsometextsometextsometextsometextsometext.  </li>

<li> sometextsometextsometextsometextsometextsometextsometextsometext
sometextsometextsometextsometextsometextsometext. </li>
</ol>
<p>
hello i am a normal paragraph.
</p>