Question

我可以通过beautifulsoup在以下HTML标签中获取数字吗？

<tr align="center" height="15" id="tr_1599656" bgcolor="#ffffff" index="0"></tr>
<tr align="center" height="15" id="tr_1599657" bgcolor="#ffffff" index="1"></tr>
<tr align="center" height="15" id="tr_1599644" bgcolor="#ffffff" index="2"></tr>

我尝试过的Python代码

from bs4 import BeautifulSoup
import re

html_code = """"
<tr align="center" height="15" id="tr_1599656" bgcolor="#ffffff" index="0"></tr>
<tr align="center" height="15" id="tr_1599657" bgcolor="#ffffff" index="1"></tr>
<tr align="center" height="15" id="tr_1599644" bgcolor="#ffffff" index="2"></tr>
"""
soup = BeautifulSoup(html_code,'html.parser')
rows = soup.findAll("tr", {"id" : re.compile('tr_*\d')})
print rows

预期产量

1599656
1599657
1599644

Answer 1

soup=BeautifulSoup('<tr align="center" height="15" id="tr_1599656" bgcolor="#ffffff" index="0"></tr><tr align="center" height="15" id="tr_1599657" bgcolor="#ffffff" index="1"></tr><tr align="center" height="15" id="tr_1599644" bgcolor="#ffffff" index="2"></tr>')

lines=soup.find_all('tr')

for line in lines:print(re.findall('\d+',line['id'])[0])

请下次自行尝试一次。：）

Answer 2

假定所有id属性都遵循模式tr_XXXXXXX。此代码将适用于此

ALTER DOMAIN public."POSTAL_CODE" set default '00000';

输出

1599656
  1599657
  1599644

变量from bs4 import BeautifulSoup soup = BeautifulSoup(html_code,'html.parser') for t in soup.findAll('tr'): print(t['id'][3:])包含您在问题中发布的一段HTML代码

通过beautifulsoup在以下HTML标记中获取数字？

2 个答案:

输出