我写了下面的代码:
require 'nokogiri'
require 'pp'
html = <<-END
<html>
<head>
<title> A Dirge </title>
<link rel = "schema.DC"
href = "http://purl.org/DC/elements/1.0/">
<meta name = "DC.Title"
content = "A Dirge">
<meta name = "DC.Creator"
content = "Shelley, Percy Bysshe">
<meta name = "DC.Type"
content = "poem">
<meta name = "DC.Date"
content = "1820">
<meta name = "DC.Format"
content = "text/html">
<meta name = "DC.Language"
content = "en">
</head>
<body><pre>
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
</pre></body>
</html>
END
doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc
doc.children.each do |ch|
p ch.text if ch.text?
end
但它输出:
"\n\n \n\n "
"\n\n "
现在我的问题是为什么<pre>
.. <\pre>
内的行没有打印出来?
任何人都可以帮我解决这个问题吗?
答案 0 :(得分:1)
doc.children.each
块输出比我更多:
"\n\n \n\n " "\n\n " "\n\n " "\n\n " "\n\n " "\n\n " "\n\n " "\n\n " "\n\n \n\n " "\n\n \n"
这是正确的输出;这些是<html>
的直接子节点的文本节点。
我不确定你想要的哪条“线”你没有看到。例如,如果您想要<pre>
的内容,则可以执行
doc.xpath("pre").text
得到它。如果这不能为您解答问题,您必须澄清您的问题。