Question

我正在尝试从以下html中提取samsung galaxy s3 i9300：

<a style="font-weight:bold;text-align:left; display: inline-block; height:25px;" href="product_info.php?type_id=1&amp;set_ad_type=&amp;product_id=5819985">samsung galaxy s3 i9300</a>

使用Beautiful Soup和SoupStrainer。试图过滤下来 ('a'{'style': 'font-weight:bold;'})但没有运气。什么是精确的过滤器？

谢谢！

Answer 1

如果您传入href的值，则美丽的汤会针对每个标记的href属性进行过滤：

soup.find_all(href=re.compile("product_info.php?"))

这将返回包含此单词的所有href。

或者你可以做这样的事情

# TEXT is the text you want to find, or you combine it with re like the above example
for link in soup.findAll('a', href=True, text='TEXT'):

这将返回包含a的所有href，文字为TEXT。

您可以阅读有关此here的更多信息。

无法使用SoupStrainer从html中提取文本

1 个答案: