Question

有没有办法从html文档中获取文本片段而不删除格式（标记）？
假设您收到以下文件：

<h3> Hello World </h3>

<p>
You see this world quite often. 
You must take the blue pill here blah blah...
</p>

您想要提取包含格式标签的前30个字符：

<h3> Hello World </h3>

<p>
You see this world quite often.
You must take the blue...
</p>

javascript / jquery，python，language-agonostic strategry都是受欢迎的。

Answer 1

使用格式标记

提取前30个字符

使用jquery我会这样做：

$('<p/>', {
   text : $('p').text(function(){ return this.textContent.slice(0, 30) })
}).replaceWith('p');

如果你想定位特定区块的p标签，那么你可以这样做：

$('<p/>', {
   text : $('#div p').text(function(){ return this.textContent.slice(0, 30) })
}).replaceWith('p');

Answer 2

看看这个github项目https://github.com/viralpatel/jquery.shorten它做你想做的事。