我使用以下代码:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'open-uri'
require 'logger'
require 'slowweb'
SlowWeb.limit('linkedin.com', 1, 10)
#create agent
agent = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Firefox'
agent.log = Logger.new "mech.log"
}
agent.follow_meta_refresh = true
page = agent.get("https://ca.linkedin.com/")
#login
login_form = page.forms.first
login_form.session_key = "username"
login_form.session_password = "pass"
page = agent.submit(login_form, login_form.buttons.first)
url = agent.get("https://www.linkedin.com/vsearch/f?type=all&keywords=Recruiter+Boston")
results = agent.get(url).body.scan(/\{"person"\:\{.*?\}\}/)
results.each do |person|
json = JSON.parse(person)
puts json['person']['firstName']
puts json['person']['lastName']
end
这列出了我当前连接的人,因此我已登录,但在手动搜索时,它会列出Boston Recruiters。
我怀疑我的爬虫被识别并被游戏,但如果你有任何其他想法,我很乐意听到它们。