我正在尝试从html中提取一个特定的标签(我知道从这个网站上阅读你不应该尝试用正则表达式解析html,但我只需要特定的标签,遵循一个非常特定的顺序)
这是正则表达式(在Expresso中测试)并且应该完美地工作
(?<ExternalSource2>\<eds2[\s.]+url\=\"?(?<Url>[\w\./:\?=&\+%\d_-]+)\"?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>)
尝试在C#中使用此代码时出现问题
Regex re = new Regex(@"(?<ExternalSource2>\<eds2[\s.]+url\=\""?(?<Url>[\w\./:\?=&\+%\d_-]+)\""?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>)");
string Input = @"width: 662px; height: 60px; vertical-align: middle""><eds2 url=""http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147"">PlaceLogo</eds2></td></tr></tbody></table><table style=""width: 662px; border-collapse: collapse""><tbod";
foreach (Match m in re.Matches(Input)) {
HttpContext.Current.Response.Write(string.Format("Match : {0}<br/>", m));
short i = 0;
foreach (Group g in m.Groups) {
HttpContext.Current.Response.Write(string.Format("Group {0} = {1}<br/>", i++, g.Value));
}
HttpContext.Current.Response.Write("<br/><br/>");
}
产生这个结果:
Match : PlaceLogo
Group 0 = PlaceLogo
Group 1 = PlaceLogo
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo
这完全不是我所期待的。
当您使用下面的代码时,结果更符合我的期望(但仍然不完全):
Regex re = new Regex(@"eds2[\s.]+url\=\""?(?<Url>[\w\./:\?=&\+%\d_-]+)\""?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>");
结果:
Match : eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo
Group 0 = eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo
Group 1 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 2 = PlaceLogo
预期输出为:
Match : <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 0 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 1 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo
任何帮助表示感谢。
答案 0 :(得分:0)
我无法使用示例代码重现您的问题。它创建以下输出:
Match : <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 0 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 1 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo
请澄清你的问题。
<强>更新强>
我想,你的问题如下:
您可以将匹配结果直接写入您的响应流,而无需转义它。这意味着,它将被解释为HTML而不是文本,如您所愿
您应该将代码更改为:
Regex re = new Regex(@"(?<ExternalSource2>\<eds2[\s.]+url\=\""?(?<Url>[\w\./:\?=&\+%\d_-]+)\""?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>)");
string Input = @"width: 662px; height: 60px; vertical-align: middle""><eds2 url=""http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147"">PlaceLogo</eds2></td></tr></tbody></table><table style=""width: 662px; border-collapse: collapse""><tbod";
foreach (Match m in re.Matches(Input))
{
HttpContext.Current.Response.Write(string.Format("Match : {0}<br/>",
Server.HtmlEncode(m)));
short i = 0;
foreach (Group g in m.Groups)
{
HttpContext.Current.Response
.Write(string.Format("Group {0} = {1}<br/>", i++,
Server.HtmlEncode(g.Value)));
}
HttpContext.Current.Response.Write("<br/><br/>");
}