Question

我是nokogiri的新手，但看起来这将是我用来刮取网页的工具。我正在寻找网页上的特定字词。单词为“Valid”，“Requirements Met”和“Requirements not”。我正在使用watir开车穿过网站。我目前有：

page = Nokogiri::HTML.parse(browser.html)

获取HTML，但我不知道从哪里开始。

感谢您的帮助！

Answer 1

如果您使用Watir来推动网站，我建议您使用Watir来检查文本。您可以使用以下方式获取页面上的所有文本：

ie.text      #Where ie is a Watir::IE

然后，您可以检查是否包含这些单词（通过与正则表达式进行比较）：

if ie.text =~ /Valid|Requirements Met|Requirements Not/
  #Do something if the words are on the page
end

也就是说，如果您正在寻找特定的文本位，您可以使用Watir专门查找这些元素（并避免解析文本或HTML）。如果您可以提供正在处理的HTML样本，我们可以帮助您找到更强大的解决方案。

Answer 2

我不确定你为什么同时使用它们。如果您只想查看文本，可以使用'net / http'或机械化来获取页面。无论如何，您可以使用browser.text.match 'Valid'检查watir中的文本，对于使用page.text.match 'Valid'的nokogiri也是如此。

Answer 3

您还应该能够使用Justin的答案中的.text方法以及标准的ruby字符串.include？返回true或false的方法。

if browser.text.include? /Valid|Requirements Met|Requirements Not/  
  #code to execute if text found
else
  #code to execute if text not found
end

如果您正在使用

，这也可以轻松实现单行验证步骤

如果使用rspec / cucumber

browser.text.should include /Valid|Requirements Met|Requirements Not/

如果使用test：Unit

assert browser.text.include? /Valid|Requirements Met|Requirements Not/