谷歌自定义搜索API和Ruby

时间:2015-05-18 20:32:55

标签: ruby google-api

我想编写一个Google搜索控制器/解析器,以便从Google的linkedin.com索引中提取员工。 Linkedin关闭了他们的API,所以我首先写了一个Mechanize / Nokogiri刮刀,这让我获得了验证码,所以我用Google搜索API宝石改写了脚本。

问题是,我无法弄清楚从哪里开始使它带回的结果超过第一页,官方文档甚至不能被描述为“稀疏”。

这是仅返回第1页的代码:

require 'rubygems'
require 'google/api_client'
require 'json'
require 'pp'
puts "What organisation's employees shall we get today?"
organisation = gets.chomp
puts "Harvesting Google Search Results - This may take some time"

apikey = "1234"
cxid = "5678"
client = Google::APIClient.new(:key => apikey, :authorization => nil,          :application_name => "linkedout", :application_version => "beta_0.5")
  search = client.discovered_api('customsearch')
  response = client.execute(
 :api_method => search.cse.list,
 :parameters => {
   'q' => 'current ' + organisation + ' site:linkedin.com',
   'maxResults' => 100,
   'key' => apikey,
   'cx' => cxid
   }
)

status, headers, body = response
jsonresponse = response.body

employees = []
@tags = JSON.parse(jsonresponse)['items']
  @tags.each do |tag|
     x = tag['title']
     x.gsub!(/ \| LinkedIn/, "")
     x.downcase!
     x.gsub!(/ profiles/, "")
        employees << x
       end
    employees = employees.uniq
    puts employees

非常感谢任何帮助 - 我还在学习这些东西。

编辑:

以下是JSON google API返回的片段:

"items": [
  {
   "kind": "customsearch#result",
   "title": "Tina Minor - Recruiter, The Walt Disney Company | LinkedIn",
   "htmlTitle": "Tina Minor - Recruiter, The \u003cb\u003eWalt     Disney\u003c/b\u003e Company | LinkedIn",
   "link": "https://www.linkedin.com/pub/tina-minor-recruiter-the-walt-disney-    company/5/849/5a6",
   "displayLink": "www.linkedin.com",
   "snippet": "View Tina Minor - Recruiter, The Walt Disney Company's     professional profile on \n... Current. The Walt Disney Company. Previous. True     Religion Brand Jeans, ...",
   "htmlSnippet": "View Tina Minor - Recruiter, The \u003cb\u003eWalt     Disney\u003c/b\u003e Company&#39;s professional profile on \u003cbr\u003e\n...     \u003cb\u003eCurrent\u003c/b\u003e. The \u003cb\u003eWalt Disney\u003c/b\u003e     Company. Previous. True Religion Brand Jeans,&nbsp;...",
   "formattedUrl": "https://www.linkedin.com/pub/tina-minor-recruiter-the-    walt-   disney.../5a6",
       "htmlFormattedUrl": "https://www.linkedin.com/pub/tina-minor-recruiter-    the-       \u003cb\u003ewalt\u003c/b\u003e-    \u003cb\u003edisney\u003c/b\u003e.../5a6",
       "pagemap": {
        "cse_image": [
         {
           "src":          "https://media.licdn.com/mpr/mpr/shrink_200_200/p/8/005/09b/3f2/1eb6f83.jpg"
      }
    ],
    "person": [
     {
      "location": "Greater Los Angeles Area",
      "role": "Recruiter, Talent Acquisition at The Walt Disney Company"
     }
    ],
    "cse_thumbnail": [
     {
      "width": "160",
      "height": "160",
      "src": "https://encrypted-tbn1.gstatic.com/images?    q=tbn:ANd9GcTbmlbDVBOMKTtOA_D88aFaPuZ9MjABABwumzBPk0F2x2P2-0puaIRlktce"
     }
    ],
    "metatags": [
     {
      "globaltrackingurl": "//www.linkedin.com/mob/tracking",
      "globaltrackingappname": "profile",
      "globaltrackingappid": "webTracking",
      "lnkd-track-json-lib": "https://static.licdn.com/scds/concat/common/js?    h=2jds9coeh4w78ed9wblscv68v-ebbt2vixcc5qz0otts5io08xv&fc=2",
      "treeid": "SnQhTqcr1RNgnKS8RSsAAA==",
      "appname": "profile",
      "pageimpressionid": "29ca4803-0233-4934-955a-1959a37dfbbf",
      "pagekey": "nprofile_v2_public_fs",
      "analyticsurl": "/analytics/noauthtracker",
      "msapplication-tileimage":     "https://static.licdn.com/scds/common/u/images/logos/linkedin/logo-in-win8-tile-    144_v1.png",
      "msapplication-tilecolor": "#0077B5",
      "application-name": "LinkedIn",
      "remote-nav-init-marker": "true"
     }
    ],
    "hcard": [
     {
      "fn": "Tina Minor - Recruiter, The Walt Disney Company",
      "title": "Recruiter, Talent Acquisition at The Walt Disney Company"
     }
    ]
   }
  }
  ...

1 个答案:

答案 0 :(得分:0)

根据Search request metadata,如果有其他结果,则应在项目旁边返回nextPage值。但是它总是说Note: This API returns up to the first 100 results only.所以看起来你已经获得了最大数量的结果。