当我运行我的函数从某个站点获取某些链接时,它会从第一页获取链接,但不会转到下一页执行相同操作,它会中断显示以下错误。
履带:
import requests
from lxml import html
def Startpoint(mpage):
page=4
while page<=mpage:
address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
tail="https://www.katalystbusiness.co.nz/business-profiles/"
page = requests.get(address)
tree = html.fromstring(page.text)
titles = tree.xpath('//p/a/@href')
for title in titles:
if "bindex" not in title:
if "cdn-cgi" not in title:
print(tail + title)
page+=1
Startpoint(5)
错误讯息:
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module>
Startpoint(5)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint
while page<=mpage:
TypeError: unorderable types: Response() <= int()
答案 0 :(得分:1)
您已将requests.get(address)
的结果分配给page
。然后Python无法将requests.Response
对象与int进行比较。只需拨打page
之类的其他内容,例如response
。您的最后一行也有缩进错误。
import requests
from lxml import html
def Startpoint(mpage):
page=4
while page<=mpage:
address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
tail="https://www.katalystbusiness.co.nz/business-profiles/"
response = requests.get(address)
tree = html.fromstring(response.text)
titles = tree.xpath('//p/a/@href')
for title in titles:
if "bindex" not in title:
if "cdn-cgi" not in title:
print(tail + title)
page+=1
Startpoint(5)
答案 1 :(得分:1)
您正在覆盖该行的page
变量:page = requests.get(address)
因此,当它在第二次迭代时返回while page<=mpage:
时,它会尝试将page
(现在是一个响应对象)与mpage
进行比较(整数。)
此外,page+=1
应位于while
循环内。