Question

我有一个列出新闻文章的页面。为了减少页面的长度，我只想显示一个预告片（文章的前200个单词/ 600个字母），然后显示一个“更多...”链接，当点击时，将扩展其余部分jQuery / Javascript方式的文章。现在，我已经弄明白，甚至在一些粘贴页面上找到了以下帮助方法，这将确保新闻文章（字符串）不会在一个单词的中间被切断：

 def shorten (string, count = 30)
    if string.length >= count
      shortened = string[0, count]
      splitted = shortened.split(/\s/)
      words = splitted.length
      splitted[0, words-1].join(" ") + ' ...'
    else
      string
    end
  end

我遇到的问题是我从数据库中获取的新闻文章正文是HTML格式。所以，如果我运气不好，上面的帮助器会在html标签的中间切断我的文章字符串并在那里插入“more ...”字符串（例如在“”之间），这将破坏我在页面上的html

有没有办法解决这个问题，还是有插件可以用来从HTML字符串中生成摘录/预告片？

Answer 1

您可以使用Sanitize和Truncate的组合。

truncate("And they found that many people were sleeping better.", 
  :omission => "... (continued)", :length => 15)
# => And they found... (continued)

我正在做一个类似的任务，我有博客文章，我只是想快速摘录。所以在我看来，我只是这样做：

sanitize(truncate(blog_post.body, length: 150))

剥离HTML标签，给我前150个字符并在视图中处理，因此它对MVC友好。

祝你好运！

Answer 2

My answer here应该可行。原始问题（错误，我问过）是关于截断markdown，但我最终将markdown转换为HTML然后截断它，所以它应该工作。

当然，如果您的网站获得了大量流量，您应该缓存摘录（可能在创建/更新帖子时，您可以将摘录存储在数据库中？），这也意味着您可以允许用户修改或进入他们自己的摘录

用法：

>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")
=> <p><b><a href="hi">Someth...</a></b></p>

..和代码（从其他答案中复制）：

require 'rexml/parsers/pullparser'

class String
  def truncate_html(len = 30, at_end = nil)
    p = REXML::Parsers::PullParser.new(self)
    tags = []
    new_len = len
    results = ''
    while p.has_next? && new_len > 0
      p_e = p.pull
      case p_e.event_type
      when :start_element
        tags.push p_e[0]
        results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
      when :end_element
        results << "</#{tags.pop}>"
      when :text
        results << p_e[0][0..new_len]
        new_len -= p_e[0].length
      else
        results << "<!-- #{p_e.inspect} -->"
      end
    end
    if at_end
      results << "..."
    end
    tags.reverse.each do |tag|
      results << "</#{tag}>"
    end
    results
  end

  private

  def attrs_to_s(attrs)
    if attrs.empty?
      ''
    else
      ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
    end
  end
end

Answer 3

非常感谢您的回答！然而，与此同时，我偶然发现jQuery HTML Truncator plugin，这完全符合我的目的，并将截断转移到客户端。它没有变得容易： - ）

Answer 4

如果你不想在html元素中间拆分，你必须编写一个更复杂的解析器。它必须记住它是否在＆lt;＆gt;的中间。阻止，如果它在两个标签之间。

即使你这样做了，你仍然会遇到问题。如果有人将整篇文章放入一个html元素，因为解析器无法将其拆分到任何地方，因为缺少结束标记。

如果可能的话，我会尽量不在文章中添加任何标签或将其保留到不包含任何内容的标签（没有<div>等等）。这样你只需要检查你是否在标签的中间，这很简单：

  def shorten (string, count = 30)
     if string.length >= count
       shortened = string[0, count]
       splitted = shortened.split(/\s/)
       words = splitted.length
       if(splitted[words-1].include? "<")
         splitted[0,words-2].join(" ") + ' ...'
       else
         splitted[0, words-1].join(" ") + ' ...'
     else
       string
     end   
  end

Answer 5

我会清理HTML并提取第一句话。假设您有一个文章模型，其中“body”属性包含HTML：

# lib/core_ext/string.rb
class String
  def first_sentence
    self[/(\A[^.|!|?]+)/, 1]
  end
end

# app/models/article.rb
def teaser
  HTML::FullSanitizer.new.sanitize(body).first_sentence
end

这会转换“＆lt; b＆gt;这＆lt; / b＆gt;是＆lt; em＆gt;重要的＆lt; / em＆gt;文章！这是文章的其余部分。”进入“这是一篇重要的文章”。

Answer 6

我使用以下解决方案解决了这个问题

安装gem'sanitize'

gem install sanitize

并使用以下代码，此处 body 是包含html标签的文本。

<%= content_tag :div, Sanitize.clean(truncate(body, length: 200, separator: ' ', omission: "... #{ link_to '(continue)', '#' }"), Sanitize::Config::BASIC).html_safe %>

摘录有效的html。我希望它有所帮助。

Answer 7

现在有一个名为HTMLTruncator的宝石可以为您解决这个问题。我用它来显示帖子摘录等，而且它非常强大。

rails：获取文章的预告片/摘录

7 个答案: