当我在终端中运行此代码时,它只会通过第一页。它不会通过起始URL的任何其他链接。我对正则表达式不好,所以情况会是这样吗?我正在关注YouTube上的一个教程,该教程几乎与我的代码相同,并且运行良好。所以我不确定这个问题是什么。
from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from ScrapBooks.items import ScrapbooksItem
class AlibrisspiderSpider(CrawlSpider):
name = "as"
allowed_domains = ["alibris.com"]
start_urls = ["https://www.alibris.com/search/books/subject/mystery/"]
rules = ( Rule(SgmlLinkExtractor(allow = "www\.alibris\.com.*"),
callback = "parse_item", follow = True), )
def parse_item(self, response):
sel = Selector(response)
item = ScrapbooksItem()
item['URL'] = response.request.url
item['bookLink'] = sel.xpath('//*[@id="selected-works"]/ul/li/a').extract()
self.log("********* Inside Parse Method ********")
return item
以下是我的items.py类
import scrapy
from scrapy.item import Item, Field
class ScrapbooksItem(Item):
# define the fields for your item here like:
# name = scrapy.Field()
URL = Field()
bookLink = Field()
答案 0 :(得分:1)
不要退货,否则
使用 yield 代替o 在parse_item结束时 returne