Question

我正在使用Nokogiri来抓取HTML：

doc = Nokogiri::HTML(open('.../search?q=foo'))
doc.xpath('//li[@class="xxx"]/h2/a').each do |row|
  puts row.at_xpath('text()')
end

XML片段：

<a href="http://www.foo.org/"><strong>Foo</strong>, Inc.</a>

我想要文字Foo, Inc.。

text()返回, Inc.，而node()返回<strong>Foo</strong>。

我错过了什么？

Answer 1

玩完你的代码后：

[1] pry(main)> require 'nokogiri'
=> true                             ^
[2] pry(main)> doc = Nokogiri::HTML.parse('<a href="http://www.foo.org/"><strong>Foo</strong>, Inc.</a>')
=> #(Document:0x50978d8 {
  name = "document",
  children = [
    #(DTD:0x50891a2 { name = "html" }),
    #(Element:0x507f71a {
      name = "html",
      children = [
        #(Element:0x5070c7e {
          name = "body",
          children = [
            #(Element:0x5023208 {
              name = "a",
              attributes = [ #(Attr:0x501dec0 { name = "href", value = "http://www.foo.org/" })],
              children = [ #(Element:0x4f7392a { name = "strong", children = [ #(Text "Foo")] }), #(Text ", Inc.")]
              })]
          })]
      })]
  })
[3] pry(main)> doc.at_xpath("//a").text
=> "Foo, Inc."
[4] pry(main)> doc.at_xpath("//a/text()").to_s
=> ", Inc."
[5] pry(main)>

我想说，以下内容可行： -

doc.xpath('//li[@class="xxx"]/h2/a').each do |row|
  puts row.text
end

XSLT：获取当前节点和子节点文本

1 个答案: