Question

我有一个列表页面来显示一组产品。每个项目都有自己格式良好的HTML描述。我想将每个项目描述的一部分显示为最多200个字符，不包括html标签和html属性。

问题是当我减少html字符串时，html字符串的返回结果可能不是格式良好（可能会丢失结束标记等）。

你们有没有想过缩小html字符串的长度并输出一个格式良好的HTML？

例如：

以下html文字是描述＆lt; p class =“abc-class”＆gt; 0123456789 ＆lt; / p＆gt; **

如果我想显示最多5个字符，我想看到的结果是＆lt; p class =“abc-class”＆gt; 01234 ＆lt; / p＆gt;

所以你要做的就是做正确的事。

PS：记住这是最简单的情况。

Answer 1

将html缩小到一个大小并不是一个好主意，因为正如你所说，你最终弄乱了有效的HTML。相反，你想要做的是减少文本描述的大小。为此，您需要提取要显示的文本，然后将其缩小到您想要的大小....

另一方面，为什么没有生成html的任何内容首先限制文本的大小开始。这样你就不必担心从html中删除文本并将其删除。

说，如果没有代码示例，就很难说了......

Answer 2

在c# Truncate HTML safely for article summary 我通过指向我的要点的链接回答了这个问题： https://gist.github.com/2413598

Answer 3

它可以完成（我已经完成了），但它仍然留有奇怪的渲染标记的可能性，特别是在应用CSS样式时。当我写它时，我是用Javascript完成的，但是仍然使用相同的方法并且涉及使用DOM而不是字符串。

正如您所看到的，它只是通过并计算找到的文本。一旦达到限制，它会截断节点中的任何剩余文本（根据需要添加省略号），然后停止处理更多子节点，并删除任何父母或祖父母等中的所有后续叔叔和大叔等。这可能（并且可以说应该）适应使用非变异的方法。

您可以自由使用下面的任何想法/策略/代码。

/*
    Given a DOM Node truncate the contained text at a certain length.
    The truncation happens in a depth-first manner.

    Any elements that exist past the exceeded length are removed
    (this includes all future children, siblings, cousins and whatever else)
    and the text in the element in which the exceed happens is truncated.

    NOTES:
    - This modifieds the original node.
    - This only supports ELEMENT and TEXT node types (other types are ignored)

    This function return true if the limit was reached.
*/
truncateNode : function (rootNode, limit, ellipses) {
    if (arguments.length < 3) {
        ellipses = "..."
    }

    // returns the length found so far.
    // if found >= limit then all FUTURE nodes should be removed
    function truncate (node, found) {
        var ELEMENT_NODE = 1
        var TEXT_NODE = 3

        switch (node.nodeType) {
            case ELEMENT_NODE:
                var child = node.firstChild
                while (child) {
                    found = truncate(child, found)
                    if (found >= limit) {
                        // remove all FUTURE elements
                        while (child.nextSibling) {
                            child.parentNode.removeChild(child.nextSibling)
                        }
                    }
                    child = child.nextSibling
                }
                return found
            case TEXT_NODE:
                var remaining = limit - found
                if (node.nodeValue.length < remaining) {
                    // still room for more (at least one more letter)
                    return found + node.nodeValue.length
                }
                node.nodeValue = node.nodeValue.substr(0, remaining) + ellipses
                return limit
            default:
                // no nothing
        }
    }

    return truncate(rootNode, 0)    
},

嗯，我真的一定很无聊。这是在C＃中。差不多一样。仍然应该更新为非变异的。向读者行使，等等，等等......

class Util
{

    public static string
    LazyWrapper (string html, int limit) {
        var d = new XmlDocument();
        d.InnerXml = html;
        var e = d.FirstChild;
        Truncate(e, limit);
        return d.InnerXml;
    }

    public static void
    Truncate(XmlNode node, int limit) {
        TruncateHelper(node, limit, 0);
    }

    public static int
    TruncateHelper(XmlNode node, int limit, int found) {
        switch (node.NodeType) {
        case XmlNodeType.Element:
            var child = node.FirstChild;
            while (child != null) {
                found = TruncateHelper(child, limit, found);
                if (found >= limit) {
                    // remove all FUTURE elements
                    while (child.NextSibling != null) {
                        child.ParentNode.RemoveChild(child.NextSibling);
                    }
                }
                child = child.NextSibling;
            }
            return found;
        case XmlNodeType.Text:
            var remaining = limit - found;
            if (node.Value.Length < remaining) {
                // still room for more (at least one more letter)
                return found + node.Value.Length;
            }
            node.Value = node.Value.Substring(0, remaining);
            return limit;
        default:
            return found;
        }
    }

}

用法和结果：

Util.LazyWrapper(@"<p class=""abc-class"">01<x/>23456789<y/></p>", 5)
// => <p class="abc-class">01<x />234</p>

Answer 4

我会这样做：

  string value = "<p class=\"abc-class\">0123456789</p>";
  char[] delimiters = new char[] { '<', '>' };
    string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
  string value2 = parts[1].ToString();
  //
  // here you do what you want to value2
  //

  Console.WriteLine(delimiters[0]+parts[0]+delimiters[1]+value2+delimiters[0]+parts[2]+delimiters[1]);
  Console.WriteLine(value);

你拆分你的字符串并且你在你感兴趣的部分工作，然后你再次构建它，也许你可以多次回收这个片段。

以这种方式拆分字符串比使用字符串更快。 split （''）

希望它符合您的需求！

Answer 5

但是你是从某个地方生成描述，还是从其他来源收到整个html。如果您正在生成产品描述，我认为您应该在将其填充到html befroe中之前进行修剪。

您的问题没有明确说明您从其他来源获得了类似的HTML，这就是为什么我认为上述建议是最简单的解决方案

如何缩小html字符串大小

5 个答案: