如何使这个代码通用

时间:2014-12-19 09:10:47

标签: python dns html-parsing

我有一个用例告诉我,当标记为link且其属性为rel=dns-prefetchprefetch时,只需说明预解析dns已启用。

我已将标记设为pre_resolve_dns_enabled并将其设置为true,如下所示。

class Extractor(HTMLParser):

    def __init__(self):
        HTMLParser.__init__(self)
        self.pre_resolve_dns_enabled = False

    def feed(self, data):
        HTMLParser.feed(self,data)

    def handle_starttag(self, tag, attrs):
        if tag == 'link' and ('rel', 'dns-prefetch') in attrs:
            self.pre_resolve_dns_enabled = True
        if tag == 'link' and ('rel', 'prefetch') in attrs:
            self.prefetch_enabled = True

我已经写了两种方法来恢复状态。

def is_pre_resolve_dns_enabled(self):
    return self.pre_resolve_dns_enabled

def is_prefetch_enabled(self):
    return self.prefetch_enabled

有没有办法可以有效地使handle_starttag方法变得通用,这样我就不必对代码进行硬编码并查询类似的内容并删除is_pre_resolve_dns_enabledis_prefetch_enabled

1 个答案:

答案 0 :(得分:0)

如果你想用Bs4这样做你可以使用它:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)  # Or some xml-like structure
# This will select all link tags which have a rel attribute with value dns-prefetch.
if len(soup.select('link[rel=dns-prefetch]')) > 0:
    self.pre_resolve_dns_enabled = True
if len(soup.select('link[rel=prefetch]')) > 0:
    self.prefetch_enabled = True