尝试抓取到下一个网页

时间:2019-04-02 18:05:26

标签: loops for-loop web-scraping beautifulsoup iterator

这是我到目前为止的代码

for page in range(1, 5):
    guitarPage 
   =requests.get('https://www.guitarguitar.co.uk/guitars/electric/page-'.format(page)).text
    soup = BeautifulSoup(guitarPage, 'lxml')
    # row = soup.find(class_='row products flex-row')
    guitars = soup.find_all(class_='col-xs-6 col-sm-4 col-md-4 col-lg-3')

这是迭代产品的实际循环

    for guitar in guitars:
        title_text = guitar.h3.text.strip()
        print('Guitar Name: ', title_text)
        price = guitar.find(class_='price bold small').text.strip()
        print('Guitar Price: ', price)
        time.sleep(0.5)

到目前为止,该代码仅在同一页面上运行,而无需继续进行下一页。 网站URL的结构围绕page-2,page-3 ++等起作用。

1 个答案:

答案 0 :(得分:0)

您必须在链接中添加{}。我还添加了时间模块。

      import requests
      from bs4 import BeautifulSoup
      import time

      for page in range(1, 5):
          guitarPage = requests.get('https://www.guitarguitar.co.uk/guitars/electric/page-{}'.format(page)).text
          soup = BeautifulSoup(guitarPage, 'lxml')
          # row = soup.find(class_='row products flex-row')
          guitars = soup.find_all(class_='col-xs-6 col-sm-4 col-md-4 col-lg-3')
          for guitar in guitars:
              title_text = guitar.h3.text.strip()
              price = guitar.find(class_='price bold small').text.strip()
              print('Guitar Name: ', title_text, 'Guitar Price: ', price)
              time.sleep(0.5)