Question

我自己搜索过，但无法成功地表达正确的表达方式。

我有一个html文件，其中包含[]之间的变量，我想知道这些变量。

<div id='client_info'>
    <p><b>[client_name]</b><br/><b>[client_company]</b></p>
    <p>[client_address]<br/>[client_CP]<br/>[client_city]</p>
</div>

它应该给我一个包含“client_name”，“client_company”，“client_address”，...的数组......

我做到了：

vars = re.search('\[(.*)\]', html_template)
groups = vars.groups()
print groups

但它会输出('client_name]</b><br/><b>[client_company',)

我尝试使用^和$，但未成功。

感谢您的帮助。

Answer 1

使用非贪婪的量词，如下：

re.search('\[(.*?)\]', html_template)

或者像这样的角色类：

re.search('\[([^\]]*)\]', html_template)

并使用re.findall获取所有匹配的子字符串。

Answer 2

Python有一个非常强大的库，名为BeautifulSoup。我建议你用它来解析html。因此，我建议您先使用此库解析div。然后执行正则表达式。

html = '''
...some htmls...
<div id='client_info'>
    <p><b>[client_name]</b><br/><b>[client_company]</b></p>
    <p>[client_address]<br/>[client_CP]<br/>[client_city]</p>
</div>
...more htmls...
'''
soup = BeautifulSoup(html)
div = soup.find("div", {"id":"client_info"})
p = div.findAll("p")
for tag in p:
    print re.findall('\[([^\]]*)\]', tag.renderContents())

可能有办法使用BeautifulSoup解析<br/>，但我不知道。

Python - 重新需要正则表达式的帮助

2 个答案: