电线问题Typhous从响应中找出effective_url

时间:2016-02-17 07:43:16

标签: ruby typhoeus

res = Typhoeus.get("http://atk.contacthr.com/38673270", 
                    followlocation: true, 
                    ssl_verifypeer: false,
                    maxredirs: 2, 
                    timeout: 60, 
                    connecttimeout: 60)
res.effective_url.to_s.force_encoding('UTF-8')
#=>  "https://careers.atk.com/viewjob.html?erjob=55430:en_US&eresc=CareerBuilder%2Ecom

效果很好!!!但是无法从主持人

的请求中找到effective_url

http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20

res = Typhoeus.get("http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20", followlocation: true, ssl_verifypeer: false, maxredirs: 2, timeout: 60, connecttimeout: 60)

=> #<Typhoeus::Response:0xcdff578 @options={:httpauth_avail=>0, :total_time=>2.345899, :starttransfer_time=>2.345752, :appconnect_time=>0.0, :pretransfer_time=>0.004462, :connect_time=>0.004435, :namelookup_time=>0.004139, :redirect_time=>0.0, :effective_url=>"http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20", :primary_ip=>"46.254.116.54", :response_code=>200, :request_size=>169, :redirect_count=>0, :return_code=>:ok, :response_headers=>"HTTP/1.0 200 OK\r\nServer: nginx\r\nDate: Wed, 17 Feb 2016 07:39:00 GMT\r\nContent-Type: text/html; charset=UTF-8\r\nVary: Accept-Encoding\r\nVary: Accept-Encoding\r\nX-Cache: MISS from Dwarpal.localdomain\r\nX-Cache-Lookup: MISS from Dwarpal.localdomain:8080\r\nX-Cache: MISS from Dwarpal.localdomain\r\nX-Cache-Lookup: MISS from Dwarpal.localdomain:8080\r\nVia: 1.0 Dwarpal.localdomain:8080 (squid/2.6.STABLE22), 1.0 Dwarpal.localdomain:8080 (squid/2.6.STABLE22)\r\nConnection: close\r\n\r\n", :response_body=>"<!DOCTYPE html>\n<html>\n    <head>\n    <script type=\"text/javascript\">\n\n      var _gaq = _gaq || [];\n     
_gaq.push(['_setAccount', 'UA-18771510-2']);\n      _gaq.push(['_trackPageview']);\n\n      (function() {\n        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;\n        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';\n        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);\n      })();\n\n    </script>\n\n<!-- line 230 -->\n\n        <script type=\"text/javascript\">\n            function Adcourier() { this.locale = 'en-usa'; }\n\n            var Adcourier = new Adcourier();\n        </script>\n    <script type=\"text/javascript\">\n            function translate_js (translation_hash) {\n                var locale = Adcourier.locale;\n if ( !locale ) { locale = 'en'; }\n\n                var text = translation_hash[locale];\n\n                if( !text ){ text = translation_hash['en']; }\n\n                for (var i=1, len=arguments.length; i<len; i++) {\n                    var regex = new RegExp('%'+i+'%', 'g');\n                    text = text.replace(regex, arguments[i]);\n                }\n                return text;\n            }\n        </script><meta http-equiv=\"refresh\" content=\"0;url=https://lhcgroup-openhire.silkroad.com/epostings/index.cfm?fuseaction=app.jobinfo&jobid=3318&company_id=17106&version=1&source=CareerBuilder\">\n", :debug_info=>#<Ethon::Easy::DebugInfo:0xce7e328 @messages=[]>}, @request=#<Typhoeus::Request:0xce00bf8 @base_url="http://www.aplitrak.com/?adid=Q29sZXR0ZS4xMjIxNS43MDJAbGhjZy5hcGxpdHJhay5jb20", @original_options={:followlocation=>true, :ssl_verifypeer=>false, :maxredirs=>2, :timeout=>60, :connecttimeout=>60, :method=>:get}, @options={:followlocation=>true, :ssl_verifypeer=>false, :maxredirs=>2, :timeout=>60, :connecttimeout=>60, :method=>:get, :headers=>{"User-Agent"=>"Typhoeus - https://github.com/typhoeus/typhoeus"}}, @on_headers=[], @response=#<Typhoeus::Response:0xcdff578 ...>, @on_complete=[], @on_success=[]>>

任何帮助都会受到欢迎!!!

1 个答案:

答案 0 :(得分:0)

为了获取effective_url,我使用Nokogiri Gem解析了回复,因为effective_url中存在script tag

noko = Nokogiri::HTML(res.response_body)
url =ActionView::Base.full_sanitizer.sanitize(noko.to_s.split("url").last).gsub("\n", "").gsub(">", "")
url[0] = ''
url

#=> "http://jobs.brassring.com/1033/ASP/TG/cim_jobdetail.asp?partnerid=25561&amp;siteid=5140&amp;Areq=14808BR&amp;source=Careerbuilder"