我正在使用Nokogiri来抓取HTML:
doc = Nokogiri::HTML(open('.../search?q=foo'))
doc.xpath('//li[@class="xxx"]/h2/a').each do |row|
puts row.at_xpath('text()')
end
XML片段:
<a href="http://www.foo.org/"><strong>Foo</strong>, Inc.</a>
我想要文字Foo, Inc.
。
text()
返回, Inc.
,而node()
返回<strong>Foo</strong>
。
我错过了什么?
答案 0 :(得分:1)
玩完你的代码后:
[1] pry(main)> require 'nokogiri'
=> true ^
[2] pry(main)> doc = Nokogiri::HTML.parse('<a href="http://www.foo.org/"><strong>Foo</strong>, Inc.</a>')
=> #(Document:0x50978d8 {
name = "document",
children = [
#(DTD:0x50891a2 { name = "html" }),
#(Element:0x507f71a {
name = "html",
children = [
#(Element:0x5070c7e {
name = "body",
children = [
#(Element:0x5023208 {
name = "a",
attributes = [ #(Attr:0x501dec0 { name = "href", value = "http://www.foo.org/" })],
children = [ #(Element:0x4f7392a { name = "strong", children = [ #(Text "Foo")] }), #(Text ", Inc.")]
})]
})]
})]
})
[3] pry(main)> doc.at_xpath("//a").text
=> "Foo, Inc."
[4] pry(main)> doc.at_xpath("//a/text()").to_s
=> ", Inc."
[5] pry(main)>
我想说,以下内容可行: -
doc.xpath('//li[@class="xxx"]/h2/a').each do |row|
puts row.text
end