我需要解析表单以从我得到的HTML中获取`IW_SessionID_的值,这是我无法工作的。
#!/usr/bin/ruby
require 'pp'
require 'nokogiri'
require 'mechanize'
r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv=\"cache-control\" content=\"no-cache\">\r\n<meta http-equiv=\"pragma\" content=\"no-cache\">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n lWidth = window.innerWidth - 30;\r\n lHeight = window.innerHeight - 30;\r\n} else {\r\n lWidth = document.body.clientWidth;\r\n lHeight = document.body.clientHeight;\r\n if (lWidth == 0) { lWidth = undefined;}\r\n if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements[\"IW_width\"].value = lWidth;\r\ndocument.forms[0].elements[\"IW_height\"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload=\"Initialize()\">\r\n<form method=post action=\"/bwtem\">\r\n<input type=hidden name=\"IW_width\">\r\n<input type=hidden name=\"IW_height\">\r\n<input type=hidden name=\"IW_SessionID_\" value=\"1wqzj1f0vec57r1apfqg51wzs88c\">\r\n<input type=hidden name=\"IW_TrackID_\" value=\"0\">\r\n</form></BODY></HTML>'
page = Nokogiri::HTML r
puts page.css('form[name="IW_SessionID_"]')
a = Mechanize.new
page2 = Mechanize::Page.new(nil,{'content-type'=>'text/html'},r,nil,a)
pp page2.form_with(:name => "IW_SessionID_")
该脚本只返回nil
。
有人能弄明白如何获得IW_SessionID_
的价值吗?
答案 0 :(得分:0)
您必须浏览示例HTML字符串,然后使用名称IW_SessionID_
搜索输入。
此示例代码适用于我:
#!/usr/bin/ruby
require 'pp'
require 'nokogiri'
require 'mechanize'
r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv="cache-control" content="no-cache">\r\n<meta http-equiv="pragma" content="no-cache">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n lWidth = window.innerWidth - 30;\r\n lHeight = window.innerHeight - 30;\r\n} else {\r\n lWidth = document.body.clientWidth;\r\n lHeight = document.body.clientHeight;\r\n if (lWidth == 0) { lWidth = undefined;}\r\n if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements["IW_width"].value = lWidth;\r\ndocument.forms[0].elements["IW_height"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload="Initialize()">\r\n<form method=post action="/bwtem">\r\n<input type=hidden name="IW_width">\r\n<input type=hidden name="IW_height">\r\n<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">\r\n<input type=hidden name="IW_TrackID_" value="0">\r\n</form></BODY></HTML>'
page = Nokogiri::HTML r
input = page.css('input[name="IW_SessionID_"]').first
puts input[:value]
答案 1 :(得分:0)
熟悉这些工具后,这很容易做到:
require 'nokogiri'
doc = Nokogiri::HTML(DATA.read)
doc.at('input[name="IW_SessionID_"]')['value']
# => "1wqzj1f0vec57r1apfqg51wzs88c"
__END__
<HTML>
<BODY>
<form method=post action="/bwtem">
<input type=hidden name="IW_height">
<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">
<input type=hidden name="IW_TrackID_" value="0">
</form>
</BODY>
</HTML>
不要做以下事情:
page.css('form[name="IW_SessionID_"]')
css
用于搜索与选择器匹配的多个元素。表单不太可能具有多个具有相同名称的隐藏输入,因此at
会更加明智。 css
返回一个NodeSet,它类似于一个节点数组,因此不像节点那样:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>foo</p>
<p>bar</p>
</body>
</html>
EOT
doc.search('p').class # => Nokogiri::XML::NodeSet
doc.at('p').class # => Nokogiri::XML::Element
text
将连接NodeSet中的文本元素,导致混乱:
doc.search('p').text # => "foobar"
而使用map(&:text)
将迭代返回其文本的节点:
doc.search('p').map(&:text) # => ["foo", "bar"]
另请注意,css(...).first
或search(...).first
与at
或其at_*
兄弟之一相同:
doc.search('p').first.to_html # => "<p>foo</p>"
doc.at('p').to_html # => "<p>foo</p>"
为清晰起见,请使用at
代替search(...).first
。
最后,将您的HTML示例剥离到最低限度,以证明您所询问的问题。除此之外的任何事情都会浪费空间和时间,因为我们正试图理解这个问题。