Question

我正在从网站检索数据并将其写入tsv文件。但是，我的代码只返回第一组而不是整个集。请帮忙。

BASE_URL = "http://www.parliament.go.ke/index.php/the-national-assembly/house-business/hansard"

#Read base_url into Beautiful soup Object
html = urllib.request.urlopen(BASE_URL).read()
soup = BeautifulSoup(html, "html.parser")

#grab <div class="itemList"> that hold links and dates to all hansard pdfs
hansards = soup.find_all("div","itemList")


#Get all hansards 
#write to a tsv file
with open("hansards.tsv","wt") as f:
    fieldnames = ("date","hansard_url")
    output = csv.writer(f, delimiter="\t")



    for div in hansards:
        hansard_link = [BASE_URL + div.a["href"]]
        hansard_date = soup.find("h3", "catItemTitle").string

        output.writerow([hansard_date,hansard_link])
        print(hansard_date)
        print(hansard_link)

print ("Done Writing File")

Answer 1

使用了错误的DIV。应该是：

#grab <div class="itemList"> that hold links and dates to all hansard pdfs
hansards = soup.find_all("div","itemContainer")

for循环应该是：

for div in hansards:
        hansard_link = [BASE_URL + div.a["href"]]
        hansard_date = div.find("h3", "catItemTitle").string

谢谢！

BeautifulSoup只返回第一个结果

1 个答案: