我想编写一个Google搜索控制器/解析器,以便从Google的linkedin.com索引中提取员工。 Linkedin关闭了他们的API,所以我首先写了一个Mechanize / Nokogiri刮刀,这让我获得了验证码,所以我用Google搜索API宝石改写了脚本。
问题是,我无法弄清楚从哪里开始使它带回的结果超过第一页,官方文档甚至不能被描述为“稀疏”。
这是仅返回第1页的代码:
require 'rubygems'
require 'google/api_client'
require 'json'
require 'pp'
puts "What organisation's employees shall we get today?"
organisation = gets.chomp
puts "Harvesting Google Search Results - This may take some time"
apikey = "1234"
cxid = "5678"
client = Google::APIClient.new(:key => apikey, :authorization => nil, :application_name => "linkedout", :application_version => "beta_0.5")
search = client.discovered_api('customsearch')
response = client.execute(
:api_method => search.cse.list,
:parameters => {
'q' => 'current ' + organisation + ' site:linkedin.com',
'maxResults' => 100,
'key' => apikey,
'cx' => cxid
}
)
status, headers, body = response
jsonresponse = response.body
employees = []
@tags = JSON.parse(jsonresponse)['items']
@tags.each do |tag|
x = tag['title']
x.gsub!(/ \| LinkedIn/, "")
x.downcase!
x.gsub!(/ profiles/, "")
employees << x
end
employees = employees.uniq
puts employees
非常感谢任何帮助 - 我还在学习这些东西。
编辑:
以下是JSON google API返回的片段:
"items": [
{
"kind": "customsearch#result",
"title": "Tina Minor - Recruiter, The Walt Disney Company | LinkedIn",
"htmlTitle": "Tina Minor - Recruiter, The \u003cb\u003eWalt Disney\u003c/b\u003e Company | LinkedIn",
"link": "https://www.linkedin.com/pub/tina-minor-recruiter-the-walt-disney- company/5/849/5a6",
"displayLink": "www.linkedin.com",
"snippet": "View Tina Minor - Recruiter, The Walt Disney Company's professional profile on \n... Current. The Walt Disney Company. Previous. True Religion Brand Jeans, ...",
"htmlSnippet": "View Tina Minor - Recruiter, The \u003cb\u003eWalt Disney\u003c/b\u003e Company's professional profile on \u003cbr\u003e\n... \u003cb\u003eCurrent\u003c/b\u003e. The \u003cb\u003eWalt Disney\u003c/b\u003e Company. Previous. True Religion Brand Jeans, ...",
"formattedUrl": "https://www.linkedin.com/pub/tina-minor-recruiter-the- walt- disney.../5a6",
"htmlFormattedUrl": "https://www.linkedin.com/pub/tina-minor-recruiter- the- \u003cb\u003ewalt\u003c/b\u003e- \u003cb\u003edisney\u003c/b\u003e.../5a6",
"pagemap": {
"cse_image": [
{
"src": "https://media.licdn.com/mpr/mpr/shrink_200_200/p/8/005/09b/3f2/1eb6f83.jpg"
}
],
"person": [
{
"location": "Greater Los Angeles Area",
"role": "Recruiter, Talent Acquisition at The Walt Disney Company"
}
],
"cse_thumbnail": [
{
"width": "160",
"height": "160",
"src": "https://encrypted-tbn1.gstatic.com/images? q=tbn:ANd9GcTbmlbDVBOMKTtOA_D88aFaPuZ9MjABABwumzBPk0F2x2P2-0puaIRlktce"
}
],
"metatags": [
{
"globaltrackingurl": "//www.linkedin.com/mob/tracking",
"globaltrackingappname": "profile",
"globaltrackingappid": "webTracking",
"lnkd-track-json-lib": "https://static.licdn.com/scds/concat/common/js? h=2jds9coeh4w78ed9wblscv68v-ebbt2vixcc5qz0otts5io08xv&fc=2",
"treeid": "SnQhTqcr1RNgnKS8RSsAAA==",
"appname": "profile",
"pageimpressionid": "29ca4803-0233-4934-955a-1959a37dfbbf",
"pagekey": "nprofile_v2_public_fs",
"analyticsurl": "/analytics/noauthtracker",
"msapplication-tileimage": "https://static.licdn.com/scds/common/u/images/logos/linkedin/logo-in-win8-tile- 144_v1.png",
"msapplication-tilecolor": "#0077B5",
"application-name": "LinkedIn",
"remote-nav-init-marker": "true"
}
],
"hcard": [
{
"fn": "Tina Minor - Recruiter, The Walt Disney Company",
"title": "Recruiter, Talent Acquisition at The Walt Disney Company"
}
]
}
}
...
答案 0 :(得分:0)
根据Search request metadata,如果有其他结果,则应在项目旁边返回nextPage
值。但是它总是说Note: This API returns up to the first 100 results only.
所以看起来你已经获得了最大数量的结果。