Question

我正在编写一个使用Beautiful Soup抓取网页的脚本。该页面包含许多我想要解析各种信息的div。对于每个div，我将创建一个字典。字典包含我想要从页面中拉出的各种内容的键值对。然后我将每个字典放入列表中。

作为我脚本的一部分，我想在每个div中指定一个递增的整数id（0,1,2）作为键值对。我的脚本中的所有内容都在工作，除了提供此ID的代码。我正在使用的虚拟数据有12个div。该脚本生成12个字典，但对于每个字典，id = 11。

以下是我正在使用的代码。我已经注释掉了其中一个功能，因为它似乎与这个特定问题无关。我怀疑问题出在我的“count_headers”函数中，但我认为我正在使用枚举。

def extract_metadata(divs):

    header_dict_list = []

    for div in divs:
        header_dict = {}

        header_index = enumerate(divs,0)

        def count_headers(div):
            for num, div in header_index:
                header_id = num
                header_dict['id'] = header_id

        def extract_header_stuff(div):

            # Bunch of stuff happens here to pull out different pieces of information and populate the header_dict for each div. 

        extract_header_stuff(div)

        count_headers(div)

        header_dict_list.append(header_dict)

    print (header_dict_list)

Answer 1

您滥用enumerate。正确的代码应该是：

    for id, div in enumerate(divs) :
        header_dict = {id: id}

        # You probably don't want to define `extract_header_stuff` in the loop
        # either.

无法使用枚举提供增量整数ID

1 个答案: