Question

我写了下面的代码：

require 'nokogiri'
require 'pp'

html = <<-END
<html>

    <head>

    <title> A Dirge </title>

    <link rel     = "schema.DC"
          href    = "http://purl.org/DC/elements/1.0/">

    <meta name    = "DC.Title"
          content = "A Dirge">

    <meta name    = "DC.Creator"
          content = "Shelley, Percy Bysshe">

    <meta name    = "DC.Type"
          content = "poem">

    <meta name    = "DC.Date"
          content = "1820">

    <meta name    = "DC.Format"
          content = "text/html">

    <meta name    = "DC.Language"
          content = "en">

    </head>

    <body><pre>

            Rough wind, that moanest loud
              Grief too sad for song;
            Wild wind, when sullen cloud
              Knells all the night long;
            Sad storm, whose tears are vain,
            Bare woods, whose branches strain,
            Deep caves and dreary main, -
              Wail, for the world's wrong!

    </pre></body>

    </html>
 END

doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc 
doc.children.each do |ch|
    p ch.text if ch.text?
end

但它输出：

"\n\n    \n\n    "
"\n\n    "

现在我的问题是为什么<pre> .. <\pre>内的行没有打印出来？

任何人都可以帮我解决这个问题吗？

Answer 1

doc.children.each块输出比我更多：

"\n\n    \n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    "
"\n\n    \n\n    "
"\n\n    \n"

这是正确的输出;这些是<html>的直接子节点的文本节点。

我不确定你想要的哪条“线”你没有看到。例如，如果您想要<pre>的内容，则可以执行

doc.xpath("pre").text

得到它。如果这不能为您解答问题，您必须澄清您的问题。

与Nokogiri :: XML :: Text＃文本输出混淆

1 个答案: