在python scrapy代码中的item对象中追加一个字符串URL

时间:2014-08-19 15:14:05

标签: python-2.7 scrapy

我想在项目中附加字符串网址:
object(item['urls'] = sel.xpath('a/@href').extract())

例:
item['urls'] = "http://lakmeindia.com" + sel.xpath('a/@href').extract()

# Item class

import scrapy

class LakmeSampleItem(scrapy.Item):
     urls = scrapy.Field()
     catagory = scrapy.Field()
     sub_category = scrapy.Field()

# lakme Spider
import scrapy

from LakmeProject.items import LakmeSampleItem

class LakmeSpider(scrapy.Spider):
    name = "lakme"
    allowed_domains = ["lakmeindia.com"]
    start_urls = [
        "http://www.lakmeindia.com/sitemap"
    ]

    def parse(self, response):
        for sel in response.xpath("//div[@class='make-up']/ul[1]/li"):
            item = LakmeSampleItem()
            item['sub_category'] = sel.xpath('span/text()').extract()
            # here i want to append url(because url is coming like [/sitemap])
            item['urls'] = sel.xpath('a/@href').extract()
            item['catagory'] = "Lakme Absolute"
            yield item

1 个答案:

答案 0 :(得分:3)

你走在正确的轨道上。您只需要注意extract()返回列表类型。所以你真正需要做的是:

item['urls'] = "http://lakmeindia.com" + sel.xpath('a/@href').extract()[0]

即,使用[0]

返回的结果extract()获取列表中的第一项