Question

鉴于此变量：

=> str = " and then there was a gigantic <a href="link.com/bug.jpg">bug</a> on her nose!"

如何编写一个函数，而不是在达到字符限制的任何地方破坏：

=> str[0..33] = " and then there was a gigantic <a "

我有一些可以很好地使用HTML的东西，并且如果打开了一个标签，则会返回结束标记：

=> some_function(str) = " and then there was a gigantic <a href="link.com/bug.jpg">bug</a>"

我甚至会满足于让事情变得更糟的事情，比如：

=> worse_function(str) = " and then there was a gigantic"

任何帮助都会很棒。显然，它必须有一个粗略的字符限制或字数限制。

更新

到目前为止，我有这个：

def friendly_excerpt(string, length)
  excerpt = string.split[0..length].to_s
  if excerpt.include?('<') && !excerpt.include?('>')
    friendly_excerpt = excerpt.slice(0..(excerpt.index('<')))
  end
  friendly_excerpt
end

Answer 1

我愿意：

Couont字符串

<

检查所有<
将标记从<移至>

所以它会是这样的：

def remove_html_tag(str)
  result = str
  tag_count = str.count('<')

  for i in 0..tag_count do
    index_1 = result.index('<')
    index_2 = result.index('>')
    result = result[0...index_1] + result[index_2..-1] 
    # the above line remove one html <> tag, and it repeats
  end

  result
end

Answer 2

我有这个解决方案：

def friendly_excerpt(string, length)
  excerpt = string.split[0..length].join(' ')
  if excerpt.include?('<') && !excerpt.include?('>')
    friendly_excerpt = excerpt.slice(0..(excerpt.index('<') - 1)).strip
  else
    friendly_excerpt = excerpt.strip
  end
  friendly_excerpt
end

似乎工作得很好。

Answer 3

如果您的目标是清除截断包含HTML的字符串，而不是自己编写函数，我建议使用gem html_truncator。它使用Nokogiri来解析HTML，然后适当地处理截断。

示例（GitHub page上的更多内容）：

HTML_Truncator.truncate("<p>Lorem ipsum dolor sit amet.</p>", 3)
# => "<p>Lorem ipsum dolor…</p>"

请注意，它默认采用 words 中的截断长度参数而不是字符，但是可以选择使用字符。

HTML_Truncator.truncate("<p>Lorem ipsum dolor sit amet.</p>", 12, :length_in_chars => true)
# => "<p>Lorem ipsum…</p>"

Answer 4

我看到HTML的分钟，我转向Nokogiri，因为我无法处理开始和结束HTML元素。我已经尝试过很多次了。假设你安装了Nokogiri ......

html_string = ' and then there was a gigantic <a href="link.com/bug.jpg">bug</a> on her nose!'
min_length = 33
res = Nokogiri.HTML(html_string)
nodes = res.elements.children.children.children #I wish I knew why all of these are needed.
nodes.reduce('') { |new_string, node| 
   break new_string if new_string.length > min_length; 
   new_string + node.to_html 
}

Rails：仅在单词之间拆分HTML字符串

4 个答案: