这适用于PHP,输入有时只是带有换行符的纯文本 有时简单的带有br和p标签的html
有时它会变得非常混乱msword带来的所有mumble jumbo ...... 现在我在我的正则表达式替换函数开始之前使用了一些if语句和html编码 即将发生。最终结果是干净除了它删除了我原来的用户输入换行符!
这是一个混乱的样本
<!--
[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>dbb</o:Author>
<o:Version>12.00</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
<o:TargetScreenSize>1024x768</o:TargetScreenSize>
</o:OfficeDocumentSettings>
</xml><![endif]
-->
<p class="MsoNormal" style="text-align:justify;mso-pagination:none">Linda S. Agnew is a partner in the Firm’s <a href="http://www.jaspanllp.com/practice-group/15/litigation">Litigation Practice Group</a> and its Appellate Practice Group. She has extensive experience in trial and appellate advocacy with an emphasis in commercial litigation and title insurance defense claims. Ms. Agnew has handled numerous complex commercial litigation matters including land use, corporate dissolution, shareholder derivative actions and litigation involving real property. She also has substantial experience in municipal law and land use. </p>
<p class="MsoNormal" style="text-align:justify;mso-pagination:none">Ms. Agnew co-authored the updated Chapter 17 of the Real Estate Titles treatise published by the New York State Bar Association in 2007 entitled "Adverse Possession.” She presently serves on the Board of Directors of the National Association of Women Business Owners, Long Island Chapter, and is active in her church. Ms. Agnew was a recipient of the 2008 Public Service Attorney of the Year Award presented by the Touro College Jacob D. Fuchsberg Law Center as well as a recipient of the 2009 Long Island Business News 40 under 40 award. </p>
<p class="MsoNormal" style="text-align:justify;mso-pagination:none">Ms. Agnew received her Juris Doctor from St. John's University School of Law and her Bachelor of Arts, magna cum laude, from Long Island University. During law school, Ms. Agnew was a senior member of the Moot Court Honor Society where she competed in several State and National Moot Court Competitions. She also interned with New York City’s Corporation Counsel where she was assigned to the Civil Torts Division. <br />
</p>
<p class="MsoNormal" style="text-align:justify;mso-pagination:none">Ms. Agnew is admitted to practice law in the State Courts of New York and New Jersey, the United States Court of Appeals for the Second Circuit and the United States District Courts for the Eastern and Southern Districts of New York and the District of New Jersey. She is a member of the New York State Bar Association (Real Property Law Section), the Nassau County Bar Association, the Suffolk County Bar Association and the Nassau County Women’s Bar Association</p>
<!--
[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
这是一个简单的解析和替换此CASE
bio = WebUtility.HtmlDecode(Regex.Replace(teammember.bio, "<!--(.|\n)*?-->", string.Empty));
if ( bio.Contains("<p")){
bio = Regex.Replace( bio ,"\r|\n|<p(.|\n)*?>", string.Empty);
bio = Regex.Replace( bio ,"</p(.)*?>", "\r\n\r\n");
} else {
bio = Regex.Replace( bio ,"\r|\n", string.Empty);
bio = Regex.Replace( bio ,"<br(.)*?>", "\r\n");
}
bio = Regex.Replace( bio ,"<li(.)*?>", "• ");
bio = Regex.Replace( bio ,"</li(.)*?>", "\r\n");
bio = Regex.Replace( bio ,"<(.|\n)*?>", string.Empty);
AND这是我的当前输出
Linda S. Agnewis是律师事务所诉讼实践小组的合伙人 上诉业务组。她在试用期间具有广泛的经验 上诉倡导,重点是商业诉讼和头衔 保险辩护理由。阿格纽女士处理了众多复杂 商业诉讼事宜,包括土地使用, corporatedissolution , 涉及 realproperty 的股东衍生诉讼和诉讼。她 在市政法和土地使用方面也有实质经验。
请注意大胆的话