我有一个scrapy脚本如下
1)将navigation_path收集到列表中并调用新的解析
g_next_page_list = []
g_next_page_set = set()
def parse(self,response):
#code to extract nav_links
for nav_link in nav_links:
if nav_link not in g_next_page_set:
g_next_page_list.append(nav_link)
g_next_page_set.add(nav_link)
for next_page in g_next_page_list:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse_start_url, dont_filter=True, )
我已将parse_start_url定义为:
def parse_start_url(self,response):
#code to extract nav_links
for nav_link in nav_links:
if nav_link not in g_next_page_set:
g_next_page_list.append(nav_link)
g_next_page_set.add(nav_link)
但是,主要解析(g_next_page_set,g_next_page_list)中的全局列表和设置不会被追加。我做错了什么?
提前致谢!
答案 0 :(得分:1)
您在这里不使用全局,您使用self.variable_name
g_next_page_list = []
g_next_page_set = set()
def parse(self,response):
#code to extract nav_links
for nav_link in nav_links:
if nav_link not in v_next_page_set:
self.g_next_page_list.append(nav_link)
self.g_next_page_set.add(nav_link)
for next_page in v_next_page_list:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse_start_url, dont_filter=True, )
def parse_start_url(self,response):
#code to extract nav_links
for nav_link in nav_links:
if nav_link not in v_next_page_set:
self.g_next_page_list.append(nav_link)
self.g_next_page_set.add(nav_link)
这应该可以使它