Scrapy:如何抓取我从蜘蛛中获取的URL? exceptions.NameError:未定义全局名称'parse_detail'

时间:2014-07-24 05:05:10

标签: python scrapy yield nameerror

我练习scrapy并有一个问题: 我想抓住我从蜘蛛那里得到的链接,不知道怎么做

这是我的代码: 如您所见,我抓取的链接将保存在参数中:movie_descriptionTW_URL
我写了yield Request(movie_descriptionTW, parse_detail)将结果发送到def:

def parse_detail(self, response):
    print(response.url)

但是有一个错误:exceptions.NameError:全局名称' parse_detail'未定义
怎么解决这个问题?
请教我!谢谢

from scrapy.spider import Spider
from scrapy.selector import Selector
from yahoo.items import YahooItem
from scrapy.http.request import Request   

class MySpider(Spider):   
    name = "yahoogo"
    start_urls = ["https://tw.movies.yahoo.com/chart.html"]  

    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath("//tr")
        items = []
        for site in sites:
            item = YahooItem()
            ranking_list = site.xpath("td[@class='c1']/span/text()").extract()
            movie_descriptionTW  = site.xpath("(td[@class='c3']/*//a)[position() < last()-1]/text() | td[@class='c3']/a[1]/text() ").extract()
            movie_descriptionTW_URL = site.xpath("(td[@class='c3']/*//a[2]/@href) | td[@class='c3']/a[1]/@href ").extract()   

            # crawl again!
            yield Request(movie_descriptionTW, parse_detail)

            if ranking_list:    
                items.append(item)
        yield items     

    def parse_detail(self, response):
        print(response.url)

1 个答案:

答案 0 :(得分:0)

使用self.parse_detail来引用类方法,如下所示:

for url in movie_descriptionTW_URL:
    yield Request(url=url, callback=self.parse_detail)