Question

设置向上

使用scrapy我刮住住房广告。

根据ad-overview page，我获得了一个包含链接到各个广告的href的列表。通过for循环，href被发送到第二解析器函数以获得每个广告的外壳特征。

def parse(self, response):
        # for href in list with hrefs
        for href in response.xpath(
                '//*[@id]/@href',
                ).extract()[1:-1]:
            yield scrapy.Request(response.urljoin(href),
                     callback=self.parse_ad)

def parse_ad(self, response):
# here follows code to obtain housing characteristics per ad

    yield {'char1': char1,
           'char2': char2,}

这很好用。

<小时/> 的问题

除了href之外，我还从广告概述页面获取邮政编码列表，

response.xpath('//*[@id]/div[1]/div/div[1]/div[1]/div[2]/meta').extract()

最终我想拥有，

    yield {'char1': char1,
           'char2': char2,
           'postal code': postal_code}

但我不确定如何，

让python选择href及其对应的postal_code
将postal_code转到yield

parse_ad()

我该怎么办？

Answer 1

To＆＃34;继续＆＃34;从回调方法到另一个回调方法的事情，使用meta：

#include <stdio.h>
#include <stdlib.h>

int main()
{
   FILE * fp;
   char cwd[1024];
   if (getcwd(cwd, sizeof(cwd)) != NULL)
   strcat(cwd, "\\init_file.txt");

   fp = fopen (cwd, "r");

   if (fp == NULL)
       return(0);

   return(0);
}

链接项目到scrapy中解析的hrefs

1 个答案: