Question

我试图使用BeautifulSoup / Python从网站上抓取表格。由于某种原因，其中一个表似乎存在于注释标记内。我可以在注释标记中获取整个文本，但我无法弄清楚如何在该文本上运行find_all命令以便能够在其中找到该表。

有没有办法告诉它在评论标签中找到的文字实际上是更多的HTML？

hockey-reference.com/boxscores/201701260BOS.html

我试图在“高级统计报告”部分下找到2个表格

Answer 1

import re

# use table text find comment text
table_text = soup.find(text=re.compile('table class="adv sortable stats_table"')) 

# use bs4 to parse the text
table_soup = BeautifulSoup(table_text, 'lxml')
# find_all ....
table_soup.find_all('tr')

在注释标签中查找标签 - Python

1 个答案: