我的大型HTML文件包含传统的双引号,如“
,关闭为”
如何在这些双引号中提取纯文本?
不幸的是,报价打开和报价关闭不在同一个标签中。
我的HTML就是这样的
<p>“And, as best friends, you would have shopped lots of times before, wouldn’t you? You’re best friends?</p>
<p>---Yes but not before that time, not before she gave birth to Shelby we weren’t shopping as much.</p>
<p>Not as much?</p>
<p>---No.”</p>
最后我想要实现的是我应该用双引号修剪所有标签,以便双引号内的全文将在一个p
标签中。
由于
答案 0 :(得分:1)
这应该对你有用
var str = '<p>“And, as best friends, you would have shopped lots of times before, wouldn’t you? You’re best friends?</p><p>---Yes but not before that time, not before she gave birth to Shelby we weren’t shopping as much.</p><p>Not as much?</p><p>---No.”</p>';
// get text within quotes
var String=str.substring(str.lastIndexOf('“')+1,str.lastIndexOf('”'));
// now to strip tags
String = String.replace(/<p>/g, '');
String = String.replace(/<\/p>/g, '');
console.log(String);
答案 1 :(得分:0)
尝试使用此脚本:
var text = "";
$("p").each(function(){
text += $(this).text().trim();
}
text = text.substring(1, text.length-1); //Removes last and first character
console.log(text);
这里是fiddle
答案 2 :(得分:0)
如果你的html文件中有一堆这么小的对话框,下面的想法可能有用。首先从p标签中提取文本。然后通过启动和关闭qoutes对它们进行分组。例如,使用Array.prototype.reduce
。 Demo
$('p') //find tags
.toArray() //make array
.map(function(p){
return $(p).text();
}) //extract text
.reduce((function(){
var collecting = 0, buffer = [];
function begin(txt) { //start grouping
if(collecting) throw new Error('Incorrect opening quote');
collecting = 1;
buffer = [txt];
}
function end(text) { //end grouping
if(!collecting) throw new Error('Incorrect closing quote');
buffer.push(text);
var txt = buffer.join('\n');
collecting = 0;
buffer = [];
return txt;
}
return function(arr, text) { //reducer
var start = text.indexOf('“') >= 0,
stop = text.indexOf('”') >= 0;
if(start) {
begin(text);
}
else if(stop) {
arr.push(end(text));
}
else {
buffer.push(text);
}
return arr;
};
}()), []); //group by quotes