我在处理在两个不同函数中产生的字典时遇到麻烦(在for循环中),在我的情况下,我在函数中创建字典,有时将其产生给两个函数,function1填充4个键(然后写入csv),function2用6个键填充(4个键与功能1相同,然后写入csv),但是,这4个键的值始终是正确的,但是多余的2个键将被复制到功能1(每当产生功能2时)。当它们单独运行时,它们都可以正确运行。
该代码是使用scrapy crawl test -o test.csv
的刮板蜘蛛。
我通过在function1中插入两个缺失的键(将它们填充为空)来修复它。我的问题是为什么会发生这种重叠?
显示相似行为的小函数(希望Google提供与我相同的结果)
# -*- coding: utf-8 -*-
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http.request import Request
from urllib.parse import quote,urlparse
class Stackoverflow(CrawlSpider):
name = 'test'
start_urls=['https://www.google.com/search?q=SQL',
'https://www.google.com/search?q=hello+world']
def parse(self,response):
item=dict()
item['hello world'] = 123
links= response.xpath('//div[@class="r"]/a')
temp_list = []
for link in links:
small_dict=dict()
if "wikipedia.org" in urlparse(link.xpath('@href').extract_first()).netloc:
small_dict['wikipedia'] = link.xpath('@href').extract_first()
temp_list.append(small_dict)
if "learnpython.org" in urlparse(link.xpath('@href').extract_first()).netloc:
small_dict['learnpython'] = link.xpath('@href').extract_first()
temp_list.append(small_dict)
for my_dict in temp_list:
for func,link in my_dict.items():
yield Request(link,callback=getattr(self, f"{func}"),meta={'item':item})
def wikipedia(self,response):
item = response.meta['item']
item['title']=response.xpath('//title/text()').extract_first()
image = response.xpath('//a[@class="image"]/img/@src').extract_first()
item['image_title'] = response.urljoin(image)
yield item
def learnpython(self,response):
item = response.meta['item']
item['title']=response.xpath('//title/text()').extract_first()
yield item