Question

从rails上的ruby中的字符串中提取标题标记及其id，text。我试过Nokogiri :: XML（内容）。我不知道如何从中提取标头标签。此外，字符串中的标题顺序也不应更改。如果我doc.css（'h1'）。每个都执行| h1 |，它将返回所有h1标签，因此订单将被更改。例如

<h1 id='h1'>Header1</h1>
<h3 id='h3'>Header3</h3>
<h2 id='h2'>Header2</h2>
<h5 id='h5'>Header5</h5>
<h6 id='h6'>Header6</h6>
<h3 id='h33'>Header3</h3>
<h4 id='h4'>Header4</h4>
<h2 id='h22'>Header2</h2>
<h6 id='h66'>Header6</h6>

结果应该是

headers = ["h1", "h3", "h2", "h5", "h6", "h3", "h4", "h2", "h6"]
toc = [{'node':'h1', 'value':'Header1', 'id':'h1' }, {'node':'h3', 'value':'Header3', 'id':'h3' }, {'node':'h2', 'value':'Header2', 'id':'h2' }, {'node':'h5', 'value':'Header5', 'id':'h5' }, {'node':'h6', 'value':'Header6', 'id':'h6' }, {'node':'h3', 'value':'Header3', 'id':'h33' }, {'node':'h4', 'value':'Header4', 'id':'h4' }, {'node':'h2', 'value':'Header2', 'id':'h22' }, {'node':'h6', 'value':'Header6', 'id':'h66' }]

我的代码：

doc = Nokogiri::XML(content)

请帮我解决这个问题。

Answer 1

我会这样做：

html_string = <<-html
<h1 id='h1'>Header1</h1>
<h3 id='h3'>Header3</h3>
<h2 id='h2'>Header2</h2>
<h5 id='h5'>Header5</h5>
<h6 id='h6'>Header6</h6>
<h3 id='h33'>Header3</h3>
<h4 id='h4'>Header4</h4>
<h2 id='h22'>Header2</h2>
<h6 id='h66'>Header6</h6>
html

require 'nokogiri'

doc = Nokogiri::HTML(html_string)
# In the below line, I am first creating the array of elements to search 
# in the html document. You may call it also array of CSS rules.
header_tags = (1..6).map { |num| "h#{num}" }
# => ["h1", "h2", "h3", "h4", "h5", "h6"]
headers = []
toc = doc.css(*header_tags).map do |node|
  headers << node.name
  {'node' => node.name, 'value' => node.text, 'id' => node['id'] }
end

如果你看一下方法css(*rules)，你会发现：

在此节点中搜索CSS规则。规则必须是一个或多个CSS选择器。

Answer 2

更短的答案：只需拨打排序方法，您就可以按照源代码中的顺序对结果进行排序。

heads = Nokogiri::HTML(object.body).css('h1, h2, h3, h4, h5, h6').sort()

从rubyonrails中的字符串中提取标题标记

2 个答案: