nokogiri:如何在给定的xpath元素周围包装html标签?

时间:2009-10-18 05:20:09

标签: ruby nokogiri

我有一个xpath来抓取每个未被任何html标签包围的文本节点。相反,它们通过<br>分隔。我想用<span>标签包装它们。

Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
.xpath("//br/following-sibling::text()|//br/preceding-sibling::text()").to_a

将返回那些文本节点。

以下完整的修订代码:

doc = Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
.xpath("//br/following-sibling::text()|//br/preceding-sibling::text()").wrap("<span></span>")
puts doc

我希望看到一个完整的html源代码,其中包含<span>个标签的文本,但我得到了以下内容:

Date: 2009-10-17,  4:36PM PDT
Reply to:
This is a spectacular open plan 1000 sq. ft. loft is in a former Canada Post building. Upon entering the loft from the hallway you are amazed at where you have arrived.... a stunning, bright and fully renovated apartment that retains its industrial feel. The restoration of the interior was planned and designed by a famous Vancouver architect.
The loft is above a police station, so youÂre guaranteed peace and quite at any time of the day or night.
The neighborhood is safe and lively with plenty of restaurants and shopping. ThereÂs a starbucks across the street and plenty of other coffee shops in the area.  Antique alley with its hidden treasures is one block away, as well as the beautiful mile long boardwalk. Skytrain station is one minute away (literally couple of buildings away). 15 minutes to Commercial drive, 20 minutes to downtown Vancouver and Olympic venues.
Apartment Features:
-       Fully furnished
-       14 ft ceilings
-       Hardwood floors
-       Gas fireplace
-       Elevator
-       Large rooftop balcony
-       Full Kitchen: Fully equipped with crystal, china and utensils
-       Dishwasher
-       Appliances including high-end juice maker, blender, etc.
-       WiFi (Wireless Internet)
-       Bathtub
-       Linens &amp; towels provided
-       Hair dryer
-       LCD Flat-screen TV with DVD player
-       Extensive DVD library
-       Music Library: Ipod connection
-       Wii console with Guitar Hero, games
-       Book and magazine library
-       Non-smoking
We are looking to exchange for a place somewhere warm (California, Hawaii, Mexico, South America, Central America) or a place in Europe (UK, Italy, France).
Email for other dates and pictures of the loft.

1 个答案:

答案 0 :(得分:4)

您的doc变量未分配给整个文档 - 您应该使用

doc = Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
doc.xpath("//br/following-sibling::text()|//br/preceding-sibling::text()").wrap("<span></span>")
puts doc

不幸的是,它并没有解决问题,因为nokogiri首先放置所有brs而不是所有跨度的文本如下:

<br><br><br><br><span>
text</span><span>
text</span><span>
text</span><span>
text</span>

但你可以这样做

doc = Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
doc.search("//br/following-sibling::text()|//br/preceding-sibling::text()").each do |node|
  node.replace(Nokogiri.make("<span>#{node.to_html}</span>"))
end
puts doc