我想从每个页面抓取链接,然后转到下一页并执行相同的操作。这是我从第一页抓取链接的代码:
import requests
from bs4 import BeautifulSoup
page='https://www.booli.se/slutpriser/goteborg/22/?objectType=L%C3%A4genhet'
request = requests.get(page)
soup = BeautifulSoup(request.text,'lxml')
links= soup.findAll('a',class_='search-list__item')
url=[]
prefix = "https://www.booli.se"
for link in links:
url.append(prefix+link["href"])
我在前三页尝试了以下内容,但它没有用。
import re
import requests
from bs4 import BeautifulSoup
url=[]
prefix = "https://www.booli.se"
with requests.Session() as session:
for page in range(4):
response = session.get("https://www.booli.se/slutpriser/goteborg/22/?
objectType=L%C3%A4genhet&page=%f" % page)
soup = BeautifulSoup(response.content, "html.parser")
links= soup.findAll('a',class_='search-list__item')
for link in links:
url.append(prefix+link["href"])
答案 0 :(得分:2)
首先,你必须创建一个可以正常工作的代码。
然后你必须把你的抓取代码放在循环中
url = "https://www.booli.se/slutpriser/goteborg/22/?objectType=L%C3%A4genhet&page=1"
while True:
code goes here
您会注意到链接末尾有一个page = number。 您必须通过更改page = number
来计算在这些URL上运行循环i=1
url = "https://www.booli.se/slutpriser/goteborg/22/?objectType=L%C3%A4genhet&page=" + str(i)
while True:
i = i+1
page = requests.get(url)
if page.status_code != 200:
break
url = "https://www.booli.se/slutpriser/goteborg/22/?objectType=L%C3%A4genhet&page=" + str(i)
#Your scraping code goes here
#
#
我使用了if语句,因此循环不会永远存在。它将上到最后一页。
答案 1 :(得分:0)
是的,我做到了。谢谢。以下是前两页的代码:
urls=[]
for page in range(3):
urls.append("https://www.booli.se/slutpriser/goteborg/22/?
objectType=L%C3%A4genhet&page={}".format(page))
page=urls[1:]
#page
import requests
from bs4 import BeautifulSoup
inturl=[]
for page in page:
request = requests.get(page)
soup = BeautifulSoup(request.text,'lxml')
links= soup.findAll('a',class_='search-list__item')
prefix = "https://www.booli.se"
for link in links:
inturl.append(prefix+link["href"])