有没有办法以漂亮的格式(而不是HTML)输出Nokogiri :: HTML :: Document对象?我希望能够看到它缩进的对象更深层次。喜欢使用awesome_print(试过它 - 不起作用)。谢谢!
目前在控制台中运行以下命令以通过以下方式实例化Nokogiri对象:
irb(main):105:0> html = open("http://www.google.com")
=> #<Tempfile:/var/folders/kx/nwfjzgd153g071ykz0mtgd0r0000gp/T/open-uri20131225-35224-y57yx3>
irb(main):106:0> document = Nokogiri::HTML(html.read)
它产生以下难以阅读的blob:
=> #<Nokogiri::HTML::Document:0x3ff87d83d7d0 name="document" children=[#<Nokogiri::XML::DTD:0x3ff87d83d2f8 name="html">, #<Nokogiri::XML::Element:0x3ff87d83cf10 name="html" attributes=[#<Nokogiri::XML::Attr:0x3ff87d83ce98 name="itemscope">, #<Nokogiri::XML::Attr:0x3ff87d83ce84 name="itemtype" value="http://schema.org/WebPage">] children=[#<Nokogiri::XML::Element:0x3ff87d83c77c name="head" children=[#<Nokogiri::XML::Element:0x3ff87d83c4c0 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d83c434 name="content" value="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for.">, #<Nokogiri::XML::Attr:0x3ff87d83c420 name="name" value="description">]>, #<Nokogiri::XML::Element:0x3ff87d83371c name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d8335b4 name="content" value="noodp">, #<Nokogiri::XML::Attr:0x3ff87d8335a0 name="name" value="robots">]>, #<Nokogiri::XML::Element:0x3ff87d8325c4 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d832510 name="itemprop" value="image">, #<Nokogiri::XML::Attr:0x3ff87d8324e8 name="content" value="/images/google_favicon_128.png">]>, #<Nokogiri::XML::Element:0x3ff87d82f964 name="title" children=[#<Nokogiri::XML::Text:0x3ff87d82f6d0 "Google">]>, #<Nokogiri::XML::Element:0x3ff87d82f478 name="script" children=[#<Nokogiri::XML::CDATA:0x3ff87d82f248 "(function(){\nwindow.google=
.....this goes on for awhile
首选输出:
<Nokogiri::HTML::Document:0x3ff87d83d7d0 name="document" ...
<Nokogiri::XML::Element:0x3ff87d83cf10 name="html" ...
<Nokogiri::XML::Attr:0x3ff87d83ce84 name="itemtype" value="http://schema.org/WebPage">] ...
<Nokogiri::XML::Element:0x3ff87d82f964 name="title" ...
...
谢谢!
答案 0 :(得分:3)
您可以使用Nokogiri::HTML::Document#to_html
来打印Nokogiri::HTML::Document
对象。
由于Nokogiri::HTML::Document
扩展Nokogiri::XML::Document
扩展了Nokogiri::XML::Node
,您还有几个serializing options使用SaveOptions
输出到不同的格式。
所以:
> document = Nokogiri::HTML(html.read)
> puts document.to_html
答案 1 :(得分:0)
使用awesome_print
gem:
$ gem install awesome_print
$ irb
require 'open-uri'
require 'nokogiri'
require 'awesome_print'
html = open("http://www.google.com")
document = Nokogiri::HTML(html.read)
ap document
与Nokogiri的to_html
方法不同,这也为您提供缩进和语法突出显示。它并不完美,但比默认打印输出更有用。