麻烦转到下一页

时间:2017-04-21 17:23:21

标签: python web-scraping

当我运行我的函数从某个站点获取某些链接时,它会从第一页获取链接,但不会转到下一页执行相同操作,它会中断显示以下错误。

履带:

import requests
from lxml import html

def Startpoint(mpage):
    page=4
    while page<=mpage:
        address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
        tail="https://www.katalystbusiness.co.nz/business-profiles/"
        page = requests.get(address)
        tree = html.fromstring(page.text)
        titles = tree.xpath('//p/a/@href')
        for title in titles:
            if "bindex" not in title:
                if "cdn-cgi" not in title:
                    print(tail + title)


    page+=1

Startpoint(5)

错误讯息:

Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module>
    Startpoint(5)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint
    while page<=mpage:
TypeError: unorderable types: Response() <= int()

2 个答案:

答案 0 :(得分:1)

您已将requests.get(address)的结果分配给page。然后Python无法将requests.Response对象与int进行比较。只需拨打page之类的其他内容,例如response。您的最后一行也有缩进错误。

import requests
from lxml import html

def Startpoint(mpage):
    page=4
    while page<=mpage:
        address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
        tail="https://www.katalystbusiness.co.nz/business-profiles/"
        response = requests.get(address)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//p/a/@href')
        for title in titles:
            if "bindex" not in title:
                if "cdn-cgi" not in title:
                    print(tail + title)


        page+=1

Startpoint(5)

答案 1 :(得分:1)

您正在覆盖该行的page变量:page = requests.get(address)

因此,当它在第二次迭代时返回while page<=mpage:时,它会尝试将page(现在是一个响应对象)与mpage进行比较(整数。)

此外,page+=1应位于while循环内。