基于此HTML:
<li><strong><a href="http://www.ukasta.org.uk/">United Kingdom Agricultural Supply Trade Association</a> (UKASTA)</strong></li>
我想获得United Kingdom Agricultural Supply TradeAssociation
和(UKASTA)
字符串。
使用Nokogiri,我写道:
linklist=link.parent.parent.css('li strong a')
linklist.each do |f|
puts f.text
end
f.text
是“英国农业供应贸易协会”,
但我怎么得到“(UKASTA)”?
答案 0 :(得分:3)
你潜水太深了。我会用:
require 'nokogiri'
html = '<li><strong><a href="http://www.ukasta.org.uk/">United Kingdom Agricultural Supply Trade Association</a> (UKASTA)</strong></li>'
doc = Nokogiri::HTML(html)
doc.at('strong').text
返回:
"United Kingdom Agricultural Supply Trade Association (UKASTA)"
如果您必须找到<a>
节点,则可以使用以下方式访问“(UKASTA)”:
a_node = doc.at('a')
a_node.text
=> "United Kingdom Agricultural Supply Trade Association"
a_node.next_sibling.text
=> " (UKASTA)"
答案 1 :(得分:2)
您可以使用children
方法,然后按位置识别数据:
require 'nokogiri'
html_doc = Nokogiri::HTML("<html><li><strong><a href="">United Kingdom Agricultural Supply Trade Association</a>(UKASTA)</strong></li></html>")
html_doc.css('li strong').children[0].text
=> United Kingdom Agricultural Supply Trade Association
html_doc.css('li strong').children[1]
=> (UKASTA)