从HTTP响应中搜索关键字#

时间:2013-06-06 19:24:31

标签: ruby parsing nokogiri

我正在使用ruby发送http请求,并收到响应。现在我需要在响应中搜索关键字。我使用Nokogiri :: HTML(response.body),以下是我想要查看关键字的http响应的一部分:

#<Nokogiri::XML::Comment:0x0c24940 "\n\t<div style=\"float:left;margin-left:20px;width:200px;\">\n\t<p><strong>Check this</strong></p>\n\t<ul>\n\t\t<li><a href='http://www.example.com' target='_blank'>example.com</a></li>\n\t\t<li><a href='http://www.example.com/hello' target='_blank'>example only</a></li>\n\t\t<li><a href=\"javascript:void(0)\" onclick=\"window.open( '/path/to/example.xml', '', 'scrollbars=yes,menubar=no,height=800,width=1000,resizable=yes,toolbar=no,location=no,status=no' );\">Example with Coffee</a></li>\n\t</ul>\n\n\t</div>\n">

为了说清楚,除了上面的内容之外,http响应还有其他几个组件,例如:

  

#<Nokogiri::XML::Text, ::Element

和更多::Comments等,每个都有不同的数字ID,如 #<Nokogiri::XML::Comment:0x0c24940#<Nokogiri::XML::Text:0x1e21321

如何使用nokogiri从http响应中检索字符串“Example with Coffee”?

更新 添加部分HTML响应:

#<Nokogiri::HTML::Document:0x53a5ef4 name="document" children=[#<Nokogiri::XML::DTD:0x53a0cb4 name="html">, #<Nokogiri::XML::Element:0x53112c8 name="html" children=[#<Nokogiri::XML::Text:0x51e3f74 "\n\t">, #<Nokogiri::XML::Element:0x52d2168 name="head" children=[#<Nokogiri::XML::Text:0x51c3738 "\n\t\t\n\t\t">, #<Nokogiri::XML::Comment:0x523dfa4 " These meta tags will prevent the browser from caching the page --!>\n\t\t<meta http-equiv='cache-control' content='no-cache'>\n\t\t<meta http-equiv='expires' content='0'>\n\t\t<meta http-equiv='pragma' content='no-cache'>\n\n\t\t<!-- Stylesheets ">, #<Nokogiri::XML::Text:0x51b9d3c " \n\t\t">, #<Nokogiri::XML::Element:0x52ae0b2 name="link" attributes=[#<Nokogiri::XML::Attr:0x52accc0 name="type" value="text/css">, #<Nokogiri::XML::Attr:0x52s2cfc name="rel" value="stylesheet">, #<Nokogiri::XML::Attr:0x52a1eb4 name="href" value="/styles/main.css">]>, #<Nokogiri::XML::Text:0x5251b70 "\n\t\t">, #<Nokogiri::XML::Element:0x5d59ed0 name="style" attributes=[#<Nokogiri::XML::Attr:0x5c5ab20 name="type" value="text/css">] children=[#<Nokogiri::XML::CDATA:0x51eca30 "\n\t\t\tul li { margin-left: -20px; }\n\t\t\t#checkPage1 { \n\t\t\t\tdisplay: inline-block;\n\t\t\t\t*display: inline;\n\t\t\t\tborder: 1px solid #618CB3;\n\t\t\t\tbackground-color: #FEFEFE;\n\t\t\t\tpadding: 3px;\n\t\t\t}\n\t\t\t

这是原始HTML请求&amp;响应:

>> uri = URI.parse("http://<1.2.3.4>/about/index.php")
=> #<URI::HTTP:0x00000008ee6760 URL:http://1.2.3.4/about/index.php>
>> http = Net::HTTP.new(uri.host, uri.port)
=> #<Net::HTTP 1.2.3.4 open=false>
>> request = Net::HTTP::Get.new(uri.request_uri)
=> #<Net::HTTP::Get GET>
>> request['Cookie'] = 'Fruits=12345'
=> "Fruits=12345"
>> response = http.request(request)
=> #<Net::HTTPOK 200 OK readbody=true>
>> response.body
=> "<html>\n\t<head>\n\t\t\n\t\t<!-- These meta tags will prevent the browser from caching the page --!>\n\t\t<meta http-equiv='cache-control' content='no-cache'>\n\t\t<meta http-equiv='expires' content='0'>\n\t\t<meta http-equiv='pragma' content='no-cache'>\n\n\t\t<!-- Stylesheets --> \n\t\t<link type=\"text/css\" rel=\"stylesheet\" href=\"/styles/main.css\">\n\t\t<style type=\"text/css\">\n\t\t\tul li { margin-left: -20px; }\n\t\t\t#licDiv { \n\t\t\t\tdisplay: inline-block;\n\t\t\t\t*display: inline;\n\t\t\t\tborder: 1px solid #6186B3;\n\t\t\t\tbackground-color: #FEFEFE;\n\t\t\t\tpadding: 3px;\n\t\t\t}\n\t\t\t#licDiv p {\n\t\t\t\tmargin: 2px 0;\n\t\t\t}\n\t\t\t.nobullets {\n\t\t\t\tpadding-left: 10;\n\t\t\t}\n\t\t\t.nobullets li {\n\t\t\t\tlist-style: none;\n\t\t\t}\n\t\t</style>\n\t\t\n\t\t<!-- Scripts -->\n\t\t<script type=\"text/javascript\" src=\"/GlobalData.js\"></script>\n\t\t<script type=\"text/javascript\">\n\t\t\tfunction fn1() {\n\n\t\t\t\t// Disable Caching\n\t\t\t\t//$.ajaxSetup({cache: false}});\n\n\t\t\t\tif( window.XMLHttpRequest ) {\n\t\t\t\t\t// Code for IE7+, Firefox, Chrome, Opera, Safari\n\t\t\t\t\txmlhttp = new XMLHttpRequest();\n\t\t\t\t}\n\t\t\t\telse {\n\t\t\t\t\t// Code for IE6\n\t\t\t\t\txmlhttp = new ActiveXObject( \"Microsoft.XMLHTTP\" );\n\t\t\t\t}\n\t\t\t\txmlhttp.onreadystatechange = function() {\n\t\t\t\t\tif( xmlhttp.readyState == 4 && xmlhttp.status == 200 ) {\n\t\t\t\t\t\tcheck2 = xmlhttp.responseText;\n\t\t\t\t\t\tif( 0 == check2 ) {\n\t\t\t\t\t\t\talert( \"msg1.\" );\n\t\t\t\t\t\t}\n\t\t\t\t\t\telse {\n\t\t\t\t\t\t\talert( \"Issue1.\" );\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\txmlhttp.open( \"GET\", \"page1.php\", true );\n\t\t\t\txmlhttp.send();\n\t\t\t}\n\t\t</script>\n\n\t</head>\n\t\n\t<body style=\"\">\n\n\t\t<!-- Page Title -->\n\t\t<div class=\"pageTitle\">\n\t\t\t<img src=\"/images/page_about.png\"/>\n\t\t\tAbout\t\t</div>\n\n\t\t<!-- The Main Area -->\n\t\t<div class=\"container\" style=\"height:700px;\">\n\n\t<p><strong>This is a big juicy \"apple\" (Fruit)</strong></p>\n\t<ul>\n<div style=\"float:left;margin-left:20px;width:200px;\">\n\t<p><strong>Helpful Links</strong></p>\n\t<ul>\n\t\t<li><a href='http://www.fruits.com' target='_blank'>Fruits.com</a></li>\n\t\t<li><a href='http://www.fruits.com/apple' target='_blank'>Apple Fruit</a></li>\n\t\t<li><a href=\"javascript:void(0)\" onclick=\"window.open( '/manual/apple.xml', '', 'scrollbars=yes,menubar=no,height=800,width=1000,resizable=yes,toolbar=no,location=no,status=no' );\">Example with Coffee</a></li>\n\t</ul>\n\n\t</div>\n-->\n\t<p style=\"clear:both;\">

HTML网页来源

  

http://pastie.org/8020148#10

0 个答案:

没有答案