我制作了一个爬网爬虫程序,可以访问该网站https://www.cartoon3rbi.net/cats.html
,然后首先通过规则打开每个节目的链接,通过parse_title方法获取其标题,然后在第三条规则上打开每个情节的链接并获取其名称。它的工作正常,我只需要知道如何为每个节目的剧集名称制作一个单独的csv文件,并将parse_title方法中的标题用作csv文件的名称。有什么建议吗?
# -*- coding: utf-8 -*-
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class FfySpider(CrawlSpider):
custom_settings = {
'CONCURRENT_REQUESTS': 1
}
name = 'FFy'
allowed_domains = ['cartoon3rbi.net']
start_urls = ['https://www.cartoon3rbi.net/cats.html']
rules = (
Rule(LinkExtractor(restrict_xpaths='//div[@class="pagination"]/a[last()]'), follow=True),
Rule(LinkExtractor(restrict_xpaths='//div[@class="cartoon_cat"]'), callback='title_parse', follow=True),
Rule(LinkExtractor(restrict_xpaths='//div[@class="cartoon_eps_name"]'), callback='parse_item', follow=True),
)
def title_parse(self, response):
title = response.xpath('//div[@class="sidebar_title"][1]/text()').extract()
def parse_item(self, response):
for el in response.xpath('//div[@id="topme"]'):
yield {
'name': el.xpath('//div[@class="block_title"]/text()').extract_first()
}
答案 0 :(得分:0)
假设标题存储在列表titles
中,相应的内容存储在列表contents
中,则每次可以调用以下自定义函数write_to_csv(title, content)
来将内容写入文件,并以名称<title>.csv
保存。
def write_to_csv(title, content=''):
# if no content is provided,
# it creates an empty csv file.
with open(title+'.csv', 'w') as f:
f.write(content)
for content, title in zip(contents, titles):
write_to_csv(title, content)