我想获取所有页面并解析它们,直到没有加载更多url作为响应(链接变量)。任何人都可以告诉我如何修改以下代码,以便它获取所有页面,直到没有加载更多的URL作为响应(链接变量)?
import urllib2,re
Fromurl = "https://somesite.com/n/series/123456/";
req = urllib2.Request(Fromurl)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36')
response = urllib2.urlopen(req)
link=response.read()
print "value of link here:"
print link
response.close()
//try to parse response here
#################### here we get next page url###################
matchNextPageUrl =re.findall('loadmore', link, re.UNICODE)
print "value of matchNextPageUrl [0][0]"
print "https://somesite.com/n/series/nexpage"+matchNextPageUrl[0][0]+".sort-number:DESC.pageNumber-1";