我有一个字典功能= {'feature1':'hi','feature2':'second feature','feature3':'third feature'}。我需要将其保存到csv文件。但是,该词典在每次迭代中都会更新,并且新词典将附加到现有的csv文件中。我正在草率地使用它。
class Myspider(SitemapSpider):
name = 'spidername'
sitemap_urls = ['https://www.arabam.com/sitemap/otomobil_1.xml']
sitemap_rules = [
('/otomobil/', 'parse'),
# ('/category/', 'parse_category'),
]
def parse(self,response):
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self,response):
# print("hi here")
features = {}
features["ad_url"] = response.request.url
#filling feature dictionary
df = pd.DataFrame.from_dict(features , orient='index')
df = df.transpose()
df.to_csv("result.csv",mode = 'a', index = False)
问题是这也将字典和密钥一起保存到csv。我在这里附上excel表格的图片: enter image description here
从直觉上讲,标题应该只在顶部填充一次,而不是在其他行中填充。我该怎么办?
答案 0 :(得分:0)
class Myspider(SitemapSpider):
name = 'spidername'
sitemap_urls = ['https://www.arabam.com/sitemap/otomobil_1.xml']
sitemap_rules = [
('/otomobil/', 'parse'),
# ('/category/', 'parse_category'),
]
custom_settings = {'FEED_FORMAT':'csv','FEED_URI':'FILEname.csv'}
def parse(self,response):
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self,response):
# print("hi here")
item = {}
item["ad_url"] = response.request.url
yield item
运行scrapy crawl spidername