Question

我目前正在使用requests和BeautifulSoup制作网络抓取工具。我正在使用for循环来创建一个字典列表，其值为href标记的a。我遇到了这样的问题，因为所有结果都是该页面上的最后一个href。这是打印出最终结果时的输出：

[{'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}, {'link': '/terms'}]

我不确定它为什么只做最后一个值。我假设这是因为通过最后一个循环，它将所有具有相同名称的键分配给该值。我怎么能解决这个问题呢？这是代码。

import json
import requests
from bs4 import BeautifulSoup

tags_dict = {}
tags_list = []

r = requests.get("http://chicosadventures.com/")

soup = BeautifulSoup(r.content, "lxml")


for link in soup.find_all('a'):
    tags_dict['link'] = link.get('href')
    tags_list.append(tags_dict)

dump = json.dumps(tags_list)
print(dump)

Answer 1

您的问题是tags_dict。您只是在列表中一次又一次地存储对该一个字典的引用，并且由于它是一个引用，因此最后一个值会反映在所有条目中。我改变了它为每次迭代创建一个新的dict对象，现在它可以正常工作

import json
import requests
from bs4 import BeautifulSoup

tags_list = []
r = requests.get("http://chicosadventures.com/")
soup = BeautifulSoup(r.content, "lxml")

for link in soup.find_all('a'):
    tags_list.append({"link": link.get('href')})

dump = json.dumps(tags_list)
print(dump)

输出：

[{＆＃34; link＆＃34;：＆＃34; /＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / about_chico＆＃34;}，{＆＃34 ;连结＆＃34 ;: ＆＃34; / about_the_author＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / about_the_illustrator＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / chico_in_the_news _＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / order_your_copy＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / contact_us＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / about_chico＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / about_the_author＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / about_the_illustrator＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / chico_in_the_news _＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / order_your_copy＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / contact_us＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / privacy＆＃34;}，{＆＃34; link＆＃34;：＆＃34; javascript：print（）＆＃34;}， {＆＃34; link＆＃34;：＆＃34; http://www.ebtech.net/＆＃34;}，{＆＃34; link＆＃34;：＆＃34; / terms＆＃34;}]

使用BeautifulSoup将字典保存到每个值的字典

1 个答案: