Question

我正在尝试从文本区域中删除MSWord格式信息，但不知道如何执行此操作。情况就像我需要将MSWord中的一些内容粘贴到文本框编辑器中。它被复制得很好，但问题是所有的格式也被复制，所以我的300个字符句子扩展到20000个字符格式的句子。任何人都可以建议我做什么？

好的一些R＆amp; D完成后我已达到某个阶段。

这是我从Word文档中复制的文本

Once the user clicks on the Cancel icon for a transaction on the Status of Business, and the transaction is eligible for cancellation, a new screen titled “Cancel Transaction” will appear, with the following fields:

这是我在$（“＃textAreaId”）中获得的内容.val（）

"

  Normal
  0




  false
  false
  false

  EN-US
  X-NONE
  X-NONE




























Once the user clicks on the Cancel icon for a
transaction on the Status of Business, and the transaction is eligible for
cancellation, a new screen titled “Cancel Transaction” will appear, with the
following fields: 



 /* Style Definitions */
 table.MsoNormalTable
    {mso-style-name:"Table Normal";
    mso-style-parent:"";
    line-height:115%;
    font-:11.0pt;"Calibri","sans-serif";
    mso-bidi-"Times New Roman";}

"

Answer 1

我终于找到了解决方案这是它

// removes MS Office generated guff
function cleanHTML(input) {
  // 1. remove line breaks / Mso classes
  var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g; 
  var output = input.replace(stringStripper, ' ');
  // 2. strip Word generated HTML comments
  var commentSripper = new RegExp('<!--(.*?)-->','g');
  var output = output.replace(commentSripper, '');
  var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
  // 3. remove tags leave content if any
  output = output.replace(tagStripper, '');
  // 4. Remove everything in between and including tags '<style(.)style(.)>'
  var badTags = ['style', 'script','applet','embed','noframes','noscript'];

  for (var i=0; i< badTags.length; i++) {
    tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
    output = output.replace(tagStripper, '');
  }
  // 5. remove attributes ' style="..."'
  var badAttributes = ['style', 'start'];
  for (var i=0; i< badAttributes.length; i++) {
    var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
    output = output.replace(attributeStripper, '');
  }
  return output;
}

Jquery从文本区域中删除MS Word格式

1 个答案: