我有一个用例告诉我,当标记为link
且其属性为rel=dns-prefetch
和prefetch
时,只需说明预解析dns已启用。
我已将标记设为pre_resolve_dns_enabled
并将其设置为true,如下所示。
class Extractor(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.pre_resolve_dns_enabled = False
def feed(self, data):
HTMLParser.feed(self,data)
def handle_starttag(self, tag, attrs):
if tag == 'link' and ('rel', 'dns-prefetch') in attrs:
self.pre_resolve_dns_enabled = True
if tag == 'link' and ('rel', 'prefetch') in attrs:
self.prefetch_enabled = True
我已经写了两种方法来恢复状态。
def is_pre_resolve_dns_enabled(self):
return self.pre_resolve_dns_enabled
def is_prefetch_enabled(self):
return self.prefetch_enabled
有没有办法可以有效地使handle_starttag
方法变得通用,这样我就不必对代码进行硬编码并查询类似的内容并删除is_pre_resolve_dns_enabled
和is_prefetch_enabled
答案 0 :(得分:0)
如果你想用Bs4这样做你可以使用它:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html) # Or some xml-like structure
# This will select all link tags which have a rel attribute with value dns-prefetch.
if len(soup.select('link[rel=dns-prefetch]')) > 0:
self.pre_resolve_dns_enabled = True
if len(soup.select('link[rel=prefetch]')) > 0:
self.prefetch_enabled = True