如何使用scrapy获取google.com中匹配单词的数量

时间:2017-12-28 04:05:38

标签: python-3.x scrapy

如果我对scrapy应用程序说了一句话。它必须在谷歌搜索并打印匹配的单词的计数。不要在应用程序中从控制台硬编码该单词。

import scrapy


class GogleSpider(scrapy.Spider):
    name = 'gogle'
    allowed_domains = ['google.co.in']
    start_urls = ['https://www.google.co.in/?gfe_rd=cr/']

    def parse(self, response):

2 个答案:

答案 0 :(得分:0)

documentation

一样
import scrapy


class GogleSpider(scrapy.Spider):

    name = 'gogle'

    allowed_domains = ['google.co.in']
    start_urls = ['https://www.google.co.in/?gfe_rd=cr/']

    def __init__(self, word=None, *args, **kwargs):
        super(GogleSpider, self).__init__(*args, **kwargs)
        self.word = word

    def parse(self, response):
        print("word:", self.word)

现在你可以在控制台中运行它

scrapy crawl gogle -a word=electronics

您可以在"electronics"

中的parse()中获得self.word

答案 1 :(得分:-1)

import scrapy
import re


class GogleSpider(scrapy.Spider):
    name = 'gogle'
    allowed_domains = ['google.co.in']
    start_urls = ['https://www.google.co.in/?gfe_rd=cr/']

    def __init__(self, word=None):
        super(GogleSpider, self).__init__()
        self.word = word

    def parse(self, response):
        string=response.xpath('//div[@class="sbqs_c"]/text()').extract()
        string=''.join(string)
        print(len(re.findall(self.word, string.lower())))