给出类似
的字符串"<p> >this line starts with an arrow <br /> this line does not </p>"
或
"<p> >this line starts with an arrow </p> <p> this line does not </p>"
如何找到以箭头开头的行并用div
包围它们这样就变成了:
"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>
答案 0 :(得分:6)
由于它是您正在解析的HTML,因此请使用该工具进行工作 - 一个HTML解析器,如BeautifulSoup
。
使用find_all()
查找以>
开头的所有文本节点,并使用新的div
标记wrap()
:
from bs4 import BeautifulSoup
data = "<p> >this line starts with an arrow <br /> this line does not </p>"
soup = BeautifulSoup(data)
for item in soup.find_all(text=lambda x: x.strip().startswith('>')):
item.wrap(soup.new_tag('div'))
print soup.prettify()
打印:
<p>
<div>
>this line starts with an arrow
</div>
<br/>
this line does not
</p>
答案 1 :(得分:3)
您可以尝试使用>\s+(>.*?)<
正则表达式模式。
import re
regex = re.compile("\\>\\s{1,}(\\>.{0,}?)\\<")
testString = "" # fill this in
matchArray = regex.findall(testString)
# the matchArray variable contains the list of matches
并将匹配的组替换为<div> matched_group </div>
。此处模式查找> >
和<
中包含的任何内容。
以下是debuggex
上的演示答案 2 :(得分:1)
你可以试试这个正则表达式,
>(\w[^<]*)
Python代码将是,
>>> import re
>>> str = '"<p> >this line starts with an arrow <br /> this line does not </p>"'
>>> m = re.sub(r'>(\w[^<]*)', r"<div> >\1</div> ", str)
>>> m
'"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>"'