Question

我想改变这个

def has_class_but_no_id(tag):
    return tag.has_key('class') and not tag.has_key('id')

此函数来自Python2而非Python3

我知道

我在这样的列表中更改了这个HTML文档

list_of_descendants = list(soup.descendants)

所以我可以获得包含类但不是id的标签通过class = blabla...找到所有标签但不是id = .... 我不知道如何处理这个问题

Answer 1

documentation说：

我重命名了一种与Python 3兼容的方法：


Tag.has_key() - ＆gt; Tag.has_attr()

此外，documentation here中提供了完全相同的功能：

如果其他匹配项都不适合您，请定义一个函数   将元素作为唯一参数。该函数应返回True   如果参数匹配，否则False。

如果标签定义了“class”，这是一个返回True的函数   属性但未定义“id”属性：
def has_class_but_no_id(tag):
    return tag.has_attr('class') and not tag.has_attr('id')

Answer 2

嘿，我解决了这个问题。

我必须做的是

1.收集所有标签（BeautifulSoup）和所有标签子（内容）

soup = BeautifulSoup(html_doc,"html.parser")
list_of_descendants = list(soup.descendants)

2.消除所有NavigableStrings（因为他们不能接受has_attr（）方法）

def terminate_navis(list_of_some):

    new_list = []

    for elem in list_of_some:

        if type(elem) == bs4.element.Tag:
            new_list.append(elem)
        else :
            continue

    return new_list 


new_list = terminate_navis(list_of_descendants)


def contents_adding(arg_list):
//this Method helps that get all the childrens of tags in lists again

    new_list = arg_list

    child_list = []

    for elem in arg_list:

        if elem.contents:

            child_list = elem.contents
            child_list = terminate_navis(child_list)
            new_list.extend(child_list)

        new_list = list(set(new_list))

    return new_list

3.如果所有标签都具有属性'class'（has_attr）并且没有'id'（也使用has_attr），则对其进行过滤

def justcl(tag_lists):

    class_lists = []

    for elem in tag_lists:
        if elem.has_attr('class'):
            class_lists.append(elem)
        else :
            continue

    return class_lists

def notids(class_lists):

    no_id_lists = []

    for elem in class_lists:

        if elem.has_attr('id'):
            continue
        else :
            no_id_lists.append(elem)

    return no_id_lists

所有这些收集的标签创建为列表并在屏幕上打印

打印或使用for循环等等......

在Python3中替换方法has_key

2 个答案: