我正在从网站检索数据并将其写入tsv文件。但是,我的代码只返回第一组而不是整个集。 请帮忙。
BASE_URL = "http://www.parliament.go.ke/index.php/the-national-assembly/house-business/hansard"
#Read base_url into Beautiful soup Object
html = urllib.request.urlopen(BASE_URL).read()
soup = BeautifulSoup(html, "html.parser")
#grab <div class="itemList"> that hold links and dates to all hansard pdfs
hansards = soup.find_all("div","itemList")
#Get all hansards
#write to a tsv file
with open("hansards.tsv","wt") as f:
fieldnames = ("date","hansard_url")
output = csv.writer(f, delimiter="\t")
for div in hansards:
hansard_link = [BASE_URL + div.a["href"]]
hansard_date = soup.find("h3", "catItemTitle").string
output.writerow([hansard_date,hansard_link])
print(hansard_date)
print(hansard_link)
print ("Done Writing File")
答案 0 :(得分:0)
使用了错误的DIV。应该是:
#grab <div class="itemList"> that hold links and dates to all hansard pdfs
hansards = soup.find_all("div","itemContainer")
for循环应该是:
for div in hansards:
hansard_link = [BASE_URL + div.a["href"]]
hansard_date = div.find("h3", "catItemTitle").string
谢谢!