我有两组代码,它们会产生scrapy
的不同结果:
包含以上两个代码示例:
b_result_list = []
b_result_page = []
1)b_result_page
:
NEXT_PAGE_SELECTOR = 'a.sb_pagN ::attr(href)'
next_page = response.css(NEXT_PAGE_SELECTOR).extract_first()
if next_page:
yield scrapy.Request(
response.urljoin(next_page),
callback=self.parse
)
b_result_page.append(next_page)
产生的示例数据:
['b.com/search?q=site%3asite.com&first=11&FORM=PORE',
b.com/search?q=site%3asite.com&first=21&FORM=PORE']
2)b_result_list
LIST_SELECTOR = '.b_algo'
for bresult in response.css(LIST_SELECTOR):
NAME_SELECTOR = 'h2 a ::attr(href)'
yield {
'name': bresult.css(LIST_SELECTOR).extract(),
}
b_result_list.append(bresult)
由此产生的示例数据:
['somesite.com', 'blog.somesite.com', 'somesite.com/about/contactus.php']
问题:我怎么能这样做(我无法理解我的想法):
将b_result_page
中的每个页面extract links
从b_result_page
访问b_result_list
?
此代码如何解决我的问题?
for brp in b_result_page:
LIST_SELECT = '.b_algo'
for page_item_result in response.css(LIST_SELECT):
NAME_SELECT = 'h2 a ::attr(href)'
yield {
'name' : page_item_result.css(LIST_SELECT).extract(),
}
b_result_list.append(page_item_result)
由于