尝试保存仅包含特定关键字的网页

时间:2015-11-30 22:45:30

标签: python web-scraping web-crawler scrapy extract

我正在尝试抓取包含特定关键字的网页,然后保存包含该关键字的网页,或者至少保存指向该网页的网址。我试过这个代码,但它没有帮助。这甚至可能吗?

from scrapy.contrib.spiders import CrawlSpider,Rule
import scrapy

import requests
from scrapy.http import Request

from scrapy import signals, Spider
from scrapy.xlib.pydispatch import dispatcher
from scrapy.selector import Selector

import scrapy
from FinalSpider.items import Page  # Defined in items.py

class FinalSpider(CrawlSpider):
    name = "FinalSpider"
    allowed_domains = ['url']
    start_urls = ['url.com/=%d' %(n)
              for n in range(0, 20)]


def parse(self, response):
    for link in response.xpath('//a[text()="100.00"]/@href').extract():
        yield Request(url=link, callback=self.parse_link)

def parse_link(self, response):
    filename = response.url.split("/")[2] + '.html'
    with open(filename, 'wb') as f:
        f.write(response.body)

这是我的items.py代码:

import scrapy

class Page(scrapy.Item):
    url = scrapy.Field()

0 个答案:

没有答案