Question

我正在尝试废弃谷歌搜索，人们也搜索链接。

当你继续google并搜索Christopher nolan时的示例。谷歌还制作了“人们也搜索”，其中包括与我们的搜索相关的人物图像，即克里斯托弗·诺兰。在这种情况下，我们的人也搜索产品（Christian Bale，Emma Thomas，Zack Synder等）。我有兴趣抓取这些数据。

我正在使用scrapy框架并编写了一个简单的剪贴板，但它返回一个空的csv数据文件。以下是我到目前为止的代码，感谢您的帮助。希望一切都清楚我想达到的目标。我使用Xpath helper（谷歌应用程序）来帮助找到Xpath。

我的代码：

# PyGSSpider(spidder folder)
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from PyGoogleSearch.items import PyGSItem
import sys

class PyGSSpider(CrawlSpider):
    name = "google"
    allowed_domains = ["www.google.com"]
    start_urls = ["https://www.google.com/#q=christopher+nolan"]

    #Extracts Christopher Nolan link     
    rules = [
        Rule(SgmlLinkExtractor(allow=("https://www.google.com/search?q=christpher+noaln&oq=christpher+noaln&aqs")), follow=True),
        Rule(SgmlLinkExtractor(allow=()), callback='parse_item')
    ]

    #Parse function for extracting the people also search link.
    def parse_item(self,response):
        self.log('Hi, this is an item page! %s' % response.url)
        sel=Selector(response)
        item=PyGSItem()
        item['peoplealsosearchfor'] = sel.xpath('//div[@id="cnt"]/@href').extract()

        return item

items.py：

from scrapy.item import Item, Field

class PyGSItem(Item):
    peoplealsosearchfor = Field()

Answer 1

这不起作用的原因是因为Google强制执行者会阻止机器人使用他们的搜索。

然而，使用Selenium可能会成功。

Scrapy谷歌搜索

1 个答案: