我想在项目中附加字符串网址:
object(item['urls'] = sel.xpath('a/@href').extract())
例:
item['urls'] = "http://lakmeindia.com" + sel.xpath('a/@href').extract()
# Item class
import scrapy
class LakmeSampleItem(scrapy.Item):
urls = scrapy.Field()
catagory = scrapy.Field()
sub_category = scrapy.Field()
# lakme Spider
import scrapy
from LakmeProject.items import LakmeSampleItem
class LakmeSpider(scrapy.Spider):
name = "lakme"
allowed_domains = ["lakmeindia.com"]
start_urls = [
"http://www.lakmeindia.com/sitemap"
]
def parse(self, response):
for sel in response.xpath("//div[@class='make-up']/ul[1]/li"):
item = LakmeSampleItem()
item['sub_category'] = sel.xpath('span/text()').extract()
# here i want to append url(because url is coming like [/sitemap])
item['urls'] = sel.xpath('a/@href').extract()
item['catagory'] = "Lakme Absolute"
yield item
答案 0 :(得分:3)
你走在正确的轨道上。您只需要注意extract()
返回列表类型。所以你真正需要做的是:
item['urls'] = "http://lakmeindia.com" + sel.xpath('a/@href').extract()[0]
即,使用[0]
extract()
获取列表中的第一项