我练习scrapy并有一个问题: 我想抓住我从蜘蛛那里得到的链接,不知道怎么做
这是我的代码:
如您所见,我抓取的链接将保存在参数中:movie_descriptionTW_URL
我写了yield Request(movie_descriptionTW, parse_detail)
将结果发送到def:
def parse_detail(self, response):
print(response.url)
但是有一个错误:exceptions.NameError:全局名称' parse_detail'未定义
怎么解决这个问题?
请教我!谢谢
from scrapy.spider import Spider
from scrapy.selector import Selector
from yahoo.items import YahooItem
from scrapy.http.request import Request
class MySpider(Spider):
name = "yahoogo"
start_urls = ["https://tw.movies.yahoo.com/chart.html"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath("//tr")
items = []
for site in sites:
item = YahooItem()
ranking_list = site.xpath("td[@class='c1']/span/text()").extract()
movie_descriptionTW = site.xpath("(td[@class='c3']/*//a)[position() < last()-1]/text() | td[@class='c3']/a[1]/text() ").extract()
movie_descriptionTW_URL = site.xpath("(td[@class='c3']/*//a[2]/@href) | td[@class='c3']/a[1]/@href ").extract()
# crawl again!
yield Request(movie_descriptionTW, parse_detail)
if ranking_list:
items.append(item)
yield items
def parse_detail(self, response):
print(response.url)
答案 0 :(得分:0)
使用self.parse_detail
来引用类方法,如下所示:
for url in movie_descriptionTW_URL:
yield Request(url=url, callback=self.parse_detail)