Snip of my output! 我遇到此错误,但无法解决。这是我的代码:
import scrapy
class torrentSpider(scrapy.Spider):
name = 'torrent'
start_urls = ['https://www.1337x.to/series-library/b/1/']
page_number = 2
def parse(self,response):
href = response.xpath('.//div[@class="movie-info"]/h3/a/@href').extract()
for urls in href:
yield {"Linkss" : "https://1337x.to" + urls}
for alphabets in list(map(chr, range(ord('a'), ord('z')+1))):
alpha_url = f'https://www.1337x.to/series-library/{alphabets}/1/'
last_page = alpha_url.xpath('.//div[@class="pagination"]/ul/li/a/text()')[-2].extract()
for numbers in str(self.page_number):
next_page = "https://www.1337x.to/series-library/" + alphabets + "/" + str(numbers)+"/"
if self.page_number <= int(last_page) :
self.page_number += 1
yield response.follow(next_page,callback=self.parse,dont_filter = True )
我已经尝试删除“ last_page = alpha_url.xpath('.// div [@ class =“ pagination”] / ul / li / a / text()')[-2] .extract( )“
但是它不起作用。任何帮助将不胜感激。
答案 0 :(得分:0)
查看您的代码:
alpha_url = f'https://www.1337x.to/series-library/{alphabets}/1/'
last_page = alpha_url.xpath(...)...
您专门将alpha_url
设置为字符串。在下一行,您尝试调用字符串没有的方法。
您必须获取html格式的页面信息(就像上面对response
所做的一样),并在那个上使用xpath
。 xpath
无法对URL字符串变量进行操作。