Question

我是Beautifulsoup的新手，我正在尝试学习如何从网站上抓取搜索结果。

我已经可以在抓取网站上进行练习了，但是对于搜索表单的结果，我还是有些困惑。

例如，我想查找NSW

中所有库的名称和地址。

我将如何处理？如何根据搜索条件检索结果，并使用Beautifulsoup打开结果页面？

非常抱歉，我的初学者问题！

Xx

Answer 1

import requests
from bs4 import BeautifulSoup

library_list = []

data = {'action' : 'LibSearch', 'termtype' : 'Keyword', 'libstate' : 'NSW', 'dosearch' : 'Search', 'libtype' : 'All', 'chunk' : 20}

page = requests.get("http://www.nla.gov.au/apps/libraries/", params=data)
soup = BeautifulSoup(page.content, 'html.parser')


libraries = soup.find_all("a")


for library in libraries[5:]:
    print(library.text)
    library_list.append(library.text)

输出：

Design Centre Enmore Library
Sydney Institute

A.B. 'Banjo' Paterson Library
Sydney Grammar School
.
.

ANSTO Library
Australian Nuclear Science and Technology Organisation

.
.

注意：更改chunk参数中的data大小以获取尽可能多的参数您想要的库。

Beautifulsoup-抓取搜索结果

1 个答案: