有没有办法漂亮打印Nokogiri :: HTML :: Document对象?

时间:2013-12-25 22:24:00

标签: ruby nokogiri

有没有办法以漂亮的格式(而不是HTML)输出Nokogiri :: HTML :: Document对象?我希望能够看到它缩进的对象更深层次。喜欢使用awesome_print(试过它 - 不起作用)。谢谢!

目前在控制台中运行以下命令以通过以下方式实例化Nokogiri对象:

irb(main):105:0> html = open("http://www.google.com")
=> #<Tempfile:/var/folders/kx/nwfjzgd153g071ykz0mtgd0r0000gp/T/open-uri20131225-35224-y57yx3>
irb(main):106:0> document = Nokogiri::HTML(html.read)

它产生以下难以阅读的blob:

=> #<Nokogiri::HTML::Document:0x3ff87d83d7d0 name="document" children=[#<Nokogiri::XML::DTD:0x3ff87d83d2f8 name="html">, #<Nokogiri::XML::Element:0x3ff87d83cf10 name="html" attributes=[#<Nokogiri::XML::Attr:0x3ff87d83ce98 name="itemscope">, #<Nokogiri::XML::Attr:0x3ff87d83ce84 name="itemtype" value="http://schema.org/WebPage">] children=[#<Nokogiri::XML::Element:0x3ff87d83c77c name="head" children=[#<Nokogiri::XML::Element:0x3ff87d83c4c0 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d83c434 name="content" value="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for.">, #<Nokogiri::XML::Attr:0x3ff87d83c420 name="name" value="description">]>, #<Nokogiri::XML::Element:0x3ff87d83371c name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d8335b4 name="content" value="noodp">, #<Nokogiri::XML::Attr:0x3ff87d8335a0 name="name" value="robots">]>, #<Nokogiri::XML::Element:0x3ff87d8325c4 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d832510 name="itemprop" value="image">, #<Nokogiri::XML::Attr:0x3ff87d8324e8 name="content" value="/images/google_favicon_128.png">]>, #<Nokogiri::XML::Element:0x3ff87d82f964 name="title" children=[#<Nokogiri::XML::Text:0x3ff87d82f6d0 "Google">]>, #<Nokogiri::XML::Element:0x3ff87d82f478 name="script" children=[#<Nokogiri::XML::CDATA:0x3ff87d82f248 "(function(){\nwindow.google=
.....this goes on for awhile

首选输出:

<Nokogiri::HTML::Document:0x3ff87d83d7d0 name="document" ...
  <Nokogiri::XML::Element:0x3ff87d83cf10 name="html"  ...
    <Nokogiri::XML::Attr:0x3ff87d83ce84 name="itemtype" value="http://schema.org/WebPage">] ...
    <Nokogiri::XML::Element:0x3ff87d82f964 name="title" ...
...

谢谢!

2 个答案:

答案 0 :(得分:3)

您可以使用Nokogiri::HTML::Document#to_html来打印Nokogiri::HTML::Document对象。

由于Nokogiri::HTML::Document扩展Nokogiri::XML::Document扩展了Nokogiri::XML::Node,您还有几个serializing options使用SaveOptions输出到不同的格式。

所以:

> document = Nokogiri::HTML(html.read)
> puts document.to_html

答案 1 :(得分:0)

使用awesome_print gem:

$ gem install awesome_print
$ irb

require 'open-uri'
require 'nokogiri'
require 'awesome_print'

html = open("http://www.google.com")
document = Nokogiri::HTML(html.read)

ap document

与Nokogiri的to_html方法不同,这也为您提供缩进和语法突出显示。它并不完美,但比默认打印输出更有用。