我正在尝试从文本区域中删除MSWord格式信息,但不知道如何执行此操作。 情况就像我需要将MSWord中的一些内容粘贴到文本框编辑器中。 它被复制得很好,但问题是所有的格式也被复制,所以我的300个字符句子扩展到20000个字符格式的句子。 任何人都可以建议我做什么?
好的一些R& D完成后我已达到某个阶段。
这是我从Word文档中复制的文本
Once the user clicks on the Cancel icon for a transaction on the Status of Business, and the transaction is eligible for cancellation, a new screen titled “Cancel Transaction” will appear, with the following fields:
这是我在$(“#textAreaId”)中获得的内容.val()
"
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
Once the user clicks on the Cancel icon for a
transaction on the Status of Business, and the transaction is eligible for
cancellation, a new screen titled “Cancel Transaction” will appear, with the
following fields:
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-style-parent:"";
line-height:115%;
font-:11.0pt;"Calibri","sans-serif";
mso-bidi-"Times New Roman";}
"
答案 0 :(得分:4)
我终于找到了解决方案 这是它
// removes MS Office generated guff
function cleanHTML(input) {
// 1. remove line breaks / Mso classes
var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g;
var output = input.replace(stringStripper, ' ');
// 2. strip Word generated HTML comments
var commentSripper = new RegExp('<!--(.*?)-->','g');
var output = output.replace(commentSripper, '');
var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
// 3. remove tags leave content if any
output = output.replace(tagStripper, '');
// 4. Remove everything in between and including tags '<style(.)style(.)>'
var badTags = ['style', 'script','applet','embed','noframes','noscript'];
for (var i=0; i< badTags.length; i++) {
tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
output = output.replace(tagStripper, '');
}
// 5. remove attributes ' style="..."'
var badAttributes = ['style', 'start'];
for (var i=0; i< badAttributes.length; i++) {
var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
output = output.replace(attributeStripper, '');
}
return output;
}