我从网站上获取此代码:
import scrapy
class BrickSetSpider(scrapy.Spider):
name = "brickset_spider"
start_urls = ['http://brickset.com/sets/year-2016']
def parse(self, response):
SET_SELECTOR = '.set'
for brickset in response.css(SET_SELECTOR):
NAME_SELECTOR = 'h1 a ::text'
yield {
'name': brickset.css(NAME_SELECTOR).extract(),
}
名称是extract()方法的结果。这是inspect元素(在chrome中):
我想问一下获得名称结果的方法是10805:环游世界还是只环游世界。怎么做?
答案 0 :(得分:1)
获得" 10805:环游世界"将您的收益率更改为:
yield {
'name': " ".join(brickset.css(NAME_SELECTOR).extract()),
}
获得"环游世界"将您的收益率更改为:
yield {
'name': brickset.css(NAME_SELECTOR).extract()[-1],
}