Scrape multiple pages with loops in Python

时间:2017-10-12 10:04:03

标签: python loops beautifulsoup scrape

I successfully scraped the first page of the website, but when I tried to scrape mutiples pages, it worked but the result is totally wrong.

Code:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
for num in range(1,15):
    res = requests.get('http://www.abcde.com/Part?Page={num}&s=9&type=%8172653').text
    soup = BeautifulSoup(res,"lxml")
    for item in soup.select(".article-title"):
        print(urljoin('http://www.abcde.com',item['href']))

It only changed one number in every page's url, for example,

http://www.abcde.com/Part?Page=1&s=9&type=%8172653
http://www.abcde.com/Part?Page=2&s=9&type=%8172653

I got total 14 pages of this.

My code worked, but it just repeatedly print out the first page's url for 14 times. The result I expected was to print out all different urls from different pages using loops.

1 个答案:

答案 0 :(得分:2)

As Jon Clements pointed, format url as below :

res = requests.get('http://www.abcde.com/Part?Page={}&s=9&type=%8172653'.format(num)).text

You can find more about python format strings at pyformat.info.