res = Typhoeus.get("http://atk.contacthr.com/38673270",
followlocation: true,
ssl_verifypeer: false,
maxredirs: 2,
timeout: 60,
connecttimeout: 60)
res.effective_url.to_s.force_encoding('UTF-8')
#=> "https://careers.atk.com/viewjob.html?erjob=55430:en_US&eresc=CareerBuilder%2Ecom
效果很好!!!但是无法从主持人
的请求中找到effective_url
http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20
res = Typhoeus.get("http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20", followlocation: true, ssl_verifypeer: false, maxredirs: 2, timeout: 60, connecttimeout: 60)
=> #<Typhoeus::Response:0xcdff578 @options={:httpauth_avail=>0, :total_time=>2.345899, :starttransfer_time=>2.345752, :appconnect_time=>0.0, :pretransfer_time=>0.004462, :connect_time=>0.004435, :namelookup_time=>0.004139, :redirect_time=>0.0, :effective_url=>"http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20", :primary_ip=>"46.254.116.54", :response_code=>200, :request_size=>169, :redirect_count=>0, :return_code=>:ok, :response_headers=>"HTTP/1.0 200 OK\r\nServer: nginx\r\nDate: Wed, 17 Feb 2016 07:39:00 GMT\r\nContent-Type: text/html; charset=UTF-8\r\nVary: Accept-Encoding\r\nVary: Accept-Encoding\r\nX-Cache: MISS from Dwarpal.localdomain\r\nX-Cache-Lookup: MISS from Dwarpal.localdomain:8080\r\nX-Cache: MISS from Dwarpal.localdomain\r\nX-Cache-Lookup: MISS from Dwarpal.localdomain:8080\r\nVia: 1.0 Dwarpal.localdomain:8080 (squid/2.6.STABLE22), 1.0 Dwarpal.localdomain:8080 (squid/2.6.STABLE22)\r\nConnection: close\r\n\r\n", :response_body=>"<!DOCTYPE html>\n<html>\n <head>\n <script type=\"text/javascript\">\n\n var _gaq = _gaq || [];\n
_gaq.push(['_setAccount', 'UA-18771510-2']);\n _gaq.push(['_trackPageview']);\n\n (function() {\n var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;\n ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';\n var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);\n })();\n\n </script>\n\n<!-- line 230 -->\n\n <script type=\"text/javascript\">\n function Adcourier() { this.locale = 'en-usa'; }\n\n var Adcourier = new Adcourier();\n </script>\n <script type=\"text/javascript\">\n function translate_js (translation_hash) {\n var locale = Adcourier.locale;\n if ( !locale ) { locale = 'en'; }\n\n var text = translation_hash[locale];\n\n if( !text ){ text = translation_hash['en']; }\n\n for (var i=1, len=arguments.length; i<len; i++) {\n var regex = new RegExp('%'+i+'%', 'g');\n text = text.replace(regex, arguments[i]);\n }\n return text;\n }\n </script><meta http-equiv=\"refresh\" content=\"0;url=https://lhcgroup-openhire.silkroad.com/epostings/index.cfm?fuseaction=app.jobinfo&jobid=3318&company_id=17106&version=1&source=CareerBuilder\">\n", :debug_info=>#<Ethon::Easy::DebugInfo:0xce7e328 @messages=[]>}, @request=#<Typhoeus::Request:0xce00bf8 @base_url="http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20", @original_options={:followlocation=>true, :ssl_verifypeer=>false, :maxredirs=>2, :timeout=>60, :connecttimeout=>60, :method=>:get}, @options={:followlocation=>true, :ssl_verifypeer=>false, :maxredirs=>2, :timeout=>60, :connecttimeout=>60, :method=>:get, :headers=>{"User-Agent"=>"Typhoeus - https://github.com/typhoeus/typhoeus"}}, @on_headers=[], @response=#<Typhoeus::Response:0xcdff578 ...>, @on_complete=[], @on_success=[]>>
任何帮助都会受到欢迎!!!
答案 0 :(得分:0)
为了获取effective_url
,我使用Nokogiri Gem解析了回复,因为effective_url
中存在script tag
。
noko = Nokogiri::HTML(res.response_body)
url =ActionView::Base.full_sanitizer.sanitize(noko.to_s.split("url").last).gsub("\n", "").gsub(">", "")
url[0] = ''
url
#=> "http://jobs.brassring.com/1033/ASP/TG/cim_jobdetail.asp?partnerid=25561&siteid=5140&Areq=14808BR&source=Careerbuilder"