正则表达式捕获某些标签后的文本块

时间:2018-04-20 09:37:08

标签: ruby-on-rails ruby regex

我有这样的HTML:

<h3>What to bring</h3>
<p><p>It's important to bring good walking shoes.  You never know when you will be out walking and there's a decent chance of rain.</p></p>
<h3>How to get there</h3>
It is reachable by many ways: it lies in the visually stunning nature park.
<h3>What not to forget</h3>
Walking shoes!

如何在Rails中将其拆分为描述和内容。描述基于h3标签。

我已经有一个正则表达式来提取标题:

description.scan(/<h3>(.*?)<\/h3>/).flatten

但是如何提取每个块的内容?所以基本上我正在寻找一个包含3个文本块的数组:

["<p><p>It's important to bring good walking shoes.  You never know when you will be out walking and there's a decent chance of rain.</p></p>","It is reachable by many ways: it lies in the visually stunning nature park.","Walking shoes!"]

文本块可以跨越多行。

1 个答案:

答案 0 :(得分:1)

你可以在这个正则表达式上split

description.split(/<h3>.*?<\/h3>/)
# => [
 "\n<p><p>It's important to bring good walking shoes.  You never know when you will be out walking and there's a decent chance of rain.</p></p>\n",
 "\nIt is reachable by many ways: it lies in the visually stunning nature park.\n",
 "\nWalking shoes!\n"]

确保删除标签内的捕获组表单。

请参阅docs

  

split(pattern=nil, [limit]) → an_array

     

将str分为基于分隔符的子串,返回这些子串的数组。

     

(...)

     

如果pattern是Regexp,则str在模式匹配的位置被划分。只要模式匹配零长度字符串,str就会分成单个字符。如果pattern包含组,则相应的匹配也将在数组中返回。