我使用下面的正则表达式替换两个单词之间的文本。它有效,除了它跳过其中一些。贴在下面是一个例子。
var EditedHtml = Regex.Replace(htmlText, @"<script(.*?)</script>", "");
htmlText:
<head>
<script src=" https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js" type="text/javascript"></script>
<script src=" https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.18/jquery-ui.min.js" type="text/javascript"></script>
<script src="/AspellWeb/v2/js/dragiframe.js" type="text/javascript"></script>
<script type="text/javascript">
var applicationName = '/';
FullPath = (applicationName.length > 1) ? 'http://localhost:65355' + applicationName : 'http://localhost:65355';
//FullPath = 'http://localhost:65355';
GetPath = function (url) {
return FullPath + url;
}
</script>
<script type="text/javascript" src="../../Scripts/stats.js?"></script>
</head>
<body>
.......
<script type="text/javascript">
function loadAndInit() {
$(".dvloading").hide();
if ($.browser.mozilla) {
if (location.pathname == "/Stats/Reports") { // This is for local env.
$("#prntCss").attr("href", "../../../Content/SitePrint_FF.css");
}
else { // This is for DEV/QA/STAGE/PROD env.
$("#prntCss").attr("href", "../../Content/SitePrint_FF.css");
}
}
}
</script>
</body>
EditedHtml:
<head>
<script type="text/javascript">
var applicationName = '/';
FullPath = (applicationName.length > 1) ? 'http://localhost:65355' + applicationName : 'http://localhost:65355';
//FullPath = 'http://localhost:65355';
GetPath = function (url) {
return FullPath + url;
}
</script>
</head>
<body>
.......
<script type="text/javascript">
function loadAndInit() {
$(".dvloading").hide();
if ($.browser.mozilla) {
if (location.pathname == "/Stats/Reports") { // This is for local env.
$("#prntCss").attr("href", "../../../Content/SitePrint_FF.css");
}
else { // This is for DEV/QA/STAGE/PROD env.
$("#prntCss").attr("href", "../../Content/SitePrint_FF.css");
}
}
}
</script>
</body>
答案 0 :(得分:4)
为什么使用Regex来解析html。见this
这样的真实html解析器要容易得多HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(filename); //or doc.LoadHtml(HtmlString)
doc.DocumentNode.Descendants()
.Where(n => n.Name == "script").ToList()
.ForEach(s => s.Remove());
StringWriter wr = new StringWriter();
doc.Save(wr);
var newhtml = wr.ToString();
答案 1 :(得分:2)
在single line mode中尝试:
var EditedHtml = Regex.Replace(
htmlText, @"<script(.*?)</script>", "",
RegexOptions.Singleline);
文档引用:
指定单行模式。更改点(。)的含义,使其匹配每个字符(而不是除\ n之外的每个字符)。
答案 2 :(得分:2)
尝试
var EditedHtml = Regex.Replace(
htmlText, @"<script(.*?)</script>", "", RegexOptions.Singleline
);
使用单线模式,以便.
匹配任何字符,包括换行符。
答案 3 :(得分:0)
试试这个:
//(.|\r\n)*: matches every character and/or newline zero or more times
//(.|\r\n)*?: as few times as possible == > you get rid of <script> tags and of their content but you keep the rest of your html
var EditedHtml = Regex.Replace(htmlText, @"<script (.|\r\n)*?</script>", "");
希望有所帮助