Question

所以这是我的Scrapy搜寻器代码。我试图从网站中提取元数据值。页面上不会出现多次元数据。

class MySpider(BaseSpider):
    name = "courses"
    start_urls = ['http://www.example.com/listing']
    allowed_domains = ["example.com"]
    def parse(self, response):
     hxs = Selector(response)
    #for courses in response.xpath(response.body):
     for courses in response.xpath("//meta"):
     yield {
                'ScoreA': courses.xpath('//meta[@name="atarbur"]/@content').extract_first(),
                'ScoreB': courses.xpath('//meta[@name="atywater"]/@content').extract_first(),
                'ScoreC': courses.xpath('//meta[@name="atarsater"]/@content').extract_first(),
                'ScoreD': courses.xpath('//meta[@name="clearlywaur"]/@content').extract_first(),
               }
     for url in hxs.xpath('//ul[@class="scrapy"]/li/a/@href').extract():
      yield Request(response.urljoin(url), callback=self.parse)

所以我想要实现的是，如果任何Scores的值是一个空字符串（''），我想用0（零）重新计算它。我不确定如何在'yield'块中添加条件逻辑。

非常感谢任何帮助。

由于

Answer 1

extract_first()方法有一个默认值的可选参数，但在您的情况下，您只能使用or表达式：

foo = response.xpath('//foo').extract_first('').strip() or 0

在这种情况下，如果extract_first()返回一个没有任何文本的字符串，它将评估为False，因此将取代评估的最新成员（0）。

要将字符串类型转换为其他内容，请尝试：

foo = int(response.xpath('//foo').extract_first('').strip() or 0)

Scrapy / Python：替换空字符串

1 个答案: