我必须将带有哈希的数组作为输入数据,每个哈希都是html标记的描述(文本中的开放和结束位置以及标记的类型)。我需要生成另一个数组,其中标记按顺序排列。
例如:
input = [
{start_p: 0, end_p: 100, start_t: '<p>', end_t: '</p>'},
{start_p: 10, end_p: 50, start_t: '<p>', end_t: '</p>'},
{start_p: 0, end_p: 100, start_t: '<span>', end_t: '</span>'},
{start_p: 20, end_p: 30, start_t: '<em>', end_t: '</em>'},
{start_p: 40, end_p: 50, start_t: '<em>', end_t: '</em>'},
{start_p: 50, end_p: 60, start_t: '<em>', end_t: '</em>'},
{start_p: 70, end_p: 80, start_t: '<em>', end_t: '</em>'},
{start_p: 8, end_p: 99, start_t: '<strong>', end_t: '</strong>'}
]
expected_output: [<p><span><strong><p><em></em><em></em></p><em></em><em></em></strong></span></p>]
而不仅仅是输出中的标记,每个标记应该是带有位置和标记的哈希,例如:
{position: 0, tag: '<p>'}
最重要的是按顺序排序,尊重没有交叉标签的HTML规则(如果多个标签在同一个位置结束,那么最后打开的标签应该先行,如果一个结束,另一个打开则打开在同一个位置,结束将是第一个,等等。
这是遗留系统的一部分,输入和输出目前无法更改。此外,输入可能非常大(数十万个元素)
任何更好的解决方案,而不仅仅是强力递归?
答案 0 :(得分:1)
input.group_by { |h| h[:start_p] }.
values.
flat_map do |a|
x = 1.0
a.flat_map do |h|
x /= 2.0
[[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]]
end
end.sort_by(&:first).map(&:last).join
#=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"
步骤如下。
b = input.group_by { |h| h[:start_p] }
#=> { 0=>[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"},
# {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}],
# 10=>[{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}],
# 20=>[{:start_p=>20, :end_p=>30, :start_t=>"<em>", :end_t=>"</em>"}],
# 40=>[{:start_p=>40, :end_p=>50, :start_t=>"<em>", :end_t=>"</em>"}],
# 50=>[{:start_p=>50, :end_p=>60, :start_t=>"<em>", :end_t=>"</em>"}],
# 70=>[{:start_p=>70, :end_p=>80, :start_t=>"<em>", :end_t=>"</em>"}],
# 8=>[{:start_p=> 8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]}
c = b.values
#=> [[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"},
# {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}],
# [{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}],
# ...
# [{:start_p=>8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]]
d = c.flat_map do |a|
x = 1.0
a.flat_map do |h|
x /= 2.0
[[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]]
end
end
#=> [[0.5, "<p>"], [99.5, "</p>"], [0.25, "<span>"], [99.75, "</span>"],
# [10.5, "<p>"], [49.5, "</p>"], [20.5, "<em>"], [29.5, "</em>"],
# [40.5, "<em>"], [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"],
# [70.5, "<em>"], [79.5, "</em>"], [8.5, "<strong>"], [98.5, "</strong>"]]
d
(元组)的前四个元素对于理解我所采用的方法是最重要的。
e = d.sort_by(&:first)
#=> [[0.25, "<span>"], [0.5, "<p>"], [8.5, "<strong>"], [10.5, "<p>"],
# [20.5, "<em>"], [29.5, "</em>"], [40.5, "<em>"], [49.5, "</p>"],
# [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"], [70.5, "<em>"],
# [79.5, "</em>"], [98.5, "</strong>"], [99.5, "</p>"], [99.75, "</span>"]]
f = e.map(&:last)
#=> ["<span>", "<p>", "<strong>", "<p>", "<em>", "</em>", "<em>", "</p>",
# "</em>", "<em>", "</em>", "<em>", "</em>", "</strong>", "</p>", "</span>"]
f.join
#=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"
如果要求,我将详细说明d
以上的计算。
答案 1 :(得分:0)
我不确定强力递归的含义,但可以使用sort_by
和map
来完成。这是让sort_by
正确无误以达到所需的HTML规则的问题。
output = input.sort_by { |hsh| hsh[:start_p] }.map{|x| x.slice(:start_p, :start_t)}
output.each do |h|
h[:position] = h.delete(:start_p)
h[:tag] = h.delete(:start_t)
end
猴子修补切片法。
module MyExtension
module Hash
def slice(*keys)
::Hash[[keys, self.values_at(*keys)].transpose]
end
end
end
Hash.include MyExtension::Hash