Question

当我运行我的函数从某个站点获取某些链接时，它会从第一页获取链接，但不会转到下一页执行相同操作，它会中断显示以下错误。

履带：

import requests
from lxml import html

def Startpoint(mpage):
    page=4
    while page<=mpage:
        address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
        tail="https://www.katalystbusiness.co.nz/business-profiles/"
        page = requests.get(address)
        tree = html.fromstring(page.text)
        titles = tree.xpath('//p/a/@href')
        for title in titles:
            if "bindex" not in title:
                if "cdn-cgi" not in title:
                    print(tail + title)


    page+=1

Startpoint(5)

错误讯息：

Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module>
    Startpoint(5)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint
    while page<=mpage:
TypeError: unorderable types: Response() <= int()

Answer 1

您已将requests.get(address)的结果分配给page。然后Python无法将requests.Response对象与int进行比较。只需拨打page之类的其他内容，例如response。您的最后一行也有缩进错误。

import requests
from lxml import html

def Startpoint(mpage):
    page=4
    while page<=mpage:
        address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
        tail="https://www.katalystbusiness.co.nz/business-profiles/"
        response = requests.get(address)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//p/a/@href')
        for title in titles:
            if "bindex" not in title:
                if "cdn-cgi" not in title:
                    print(tail + title)


        page+=1

Startpoint(5)

Answer 2

您正在覆盖该行的page变量：page = requests.get(address)

因此，当它在第二次迭代时返回while page<=mpage:时，它会尝试将page（现在是一个响应对象）与mpage进行比较（整数。）

此外，page+=1应位于while循环内。

麻烦转到下一页

2 个答案: