我有一个如下字符串:
lorep ipsum <a href="#" class="link-1">dolor sit</a>amet, consectetur <a href="#" class="link-2">adipiscing</a> elit.
我需要将其拆分为片段,但保存锚点内片段的链接类。如此完美的结果将是:
['lorep ipsum ', {'link-1' => 'dolor sit'}, 'amet, consectetur', {'link-2' => 'adipiscing'}, ' elit.']<br />
或者:
['lorep ipsum ', ['link-1', 'dolor sit'], 'amet, consectetur', ['link-2', 'adipiscing'], ' elit.']
我尝试过使用此代码:
string.split(/<[^>]>/)
但它返回只返回一个片段数组。
答案 0 :(得分:0)
我会使用Nokogiri
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-eot
lorep ipsum <a href="#" class="link-1">dolor sit</a>amet, consectetur <a href="#" class="link-2">adipiscing</a> elit.
eot
ary = doc.search("//a").flat_map do |n,a|
[n.previous_sibling.text.strip,{n['class'] => n.text.strip},n.next_sibling.text.strip]
end.uniq
p ary
<强>输出强>
["lorep ipsum", {"link-1"=>"dolor sit"}, "amet, consectetur", {"link-2"=>"adipis
cing"}, "elit."]