用beautifulsoup python抓页码网

时间:2016-05-10 16:41:06

标签: python web pagination web-scraping beautifulsoup

刚开始学习python。我正试图从分页网站上抓取所有电话号码。但我的代码不会转到分页链接,只能在同一页面上循环。这里需要建议。

from bs4 import BeautifulSoup
import requests

for i in range(5000):
    url = "http://www.mobil123.com/mobil?type=used&page_number=1".format(i)
    r = requests.get(url)
    soup = BeautifulSoup(r.content)

    for record in soup.findAll('div', {"class": "card-contact-wrap"}):
        for data in soup.findAll('div', {"data-get-content": "#whatsapp"}):
            print(record.find('li').text)
            print(data.text)

2 个答案:

答案 0 :(得分:1)

您错过了放置字符串格式化程序。将url =“....”更改为

  url = "http://www.mobil123.com/mobil?type=used&page_number={0}".format(i)

答案 1 :(得分:1)

正如已经指出的那样,你缺少实际的格式占位符,如果你想要所有的页面都可以从初始页面中抓取页面数并在该范围内循环,而不是尝试对页面数进行硬编码,那么在第二个最后李:

import requests

def get_pages(url):
    soup = BeautifulSoup(requests.get(url).content,"lxml")
    yield soup
    url += "{}"
    for n in range(2, int(soup.select("#js-listings-pagination li")[-2].text) + 1):
        yield BeautifulSoup(requests.get(url.format(n)).content)




start = "http://www.mobil123.com/mobil?type=used"

for soup in get_pages(start):
    print(soup)