如何使用Nokogiri解析此HTML?

时间:2013-04-25 09:57:25

标签: ruby html-parsing nokogiri

基于此HTML:

<li><strong><a href="http://www.ukasta.org.uk/">United Kingdom Agricultural Supply Trade Association</a> (UKASTA)</strong></li>

我想获得United Kingdom Agricultural Supply TradeAssociation(UKASTA)字符串。

使用Nokogiri,我写道:

linklist=link.parent.parent.css('li strong a')
linklist.each do |f|
  puts f.text
end

f.text是“英国农业供应贸易协会”, 但我怎么得到“(UKASTA)”?

2 个答案:

答案 0 :(得分:3)

你潜水太深了。我会用:

require 'nokogiri'

html = '<li><strong><a href="http://www.ukasta.org.uk/">United Kingdom Agricultural Supply Trade Association</a> (UKASTA)</strong></li>'
doc = Nokogiri::HTML(html)
doc.at('strong').text

返回:

"United Kingdom Agricultural Supply Trade Association (UKASTA)"

如果您必须找到<a>节点,则可以使用以下方式访问“(UKASTA)”:

a_node = doc.at('a')
a_node.text
=> "United Kingdom Agricultural Supply Trade Association"
a_node.next_sibling.text
=> " (UKASTA)"

答案 1 :(得分:2)

您可以使用children方法,然后按位置识别数据:

require 'nokogiri'

html_doc = Nokogiri::HTML("<html><li><strong><a href="">United Kingdom Agricultural Supply Trade Association</a>(UKASTA)</strong></li></html>")

html_doc.css('li strong').children[0].text
=> United Kingdom Agricultural Supply Trade Association
html_doc.css('li strong').children[1]
=> (UKASTA)