在Regex中使用大于符号的问题,C#

时间:2011-06-27 14:09:36

标签: c# regex

我正在尝试从html中提取一个特定的标签(我知道从这个网站上阅读你不应该尝试用正则表达式解析html,但我只需要特定的标签,遵循一个非常特定的顺序)

这是正则表达式(在Expresso中测试)并且应该完美地工作

(?<ExternalSource2>\<eds2[\s.]+url\=\"?(?<Url>[\w\./:\?=&\+%\d_-]+)\"?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>)

尝试在C#中使用此代码时出现问题

Regex re = new Regex(@"(?<ExternalSource2>\<eds2[\s.]+url\=\""?(?<Url>[\w\./:\?=&\+%\d_-]+)\""?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>)");

        string Input = @"width: 662px; height: 60px; vertical-align: middle""><eds2 url=""http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147"">PlaceLogo</eds2></td></tr></tbody></table><table style=""width: 662px; border-collapse: collapse""><tbod";

        foreach (Match m in re.Matches(Input)) {
            HttpContext.Current.Response.Write(string.Format("Match : {0}<br/>", m));
            short i = 0;
            foreach (Group g in m.Groups) {
                HttpContext.Current.Response.Write(string.Format("Group {0} = {1}<br/>", i++, g.Value));
            }
            HttpContext.Current.Response.Write("<br/><br/>");
        }

产生这个结果:

Match : PlaceLogo
Group 0 = PlaceLogo
Group 1 = PlaceLogo
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo

这完全不是我所期待的。

当您使用下面的代码时,结果更符合我的期望(但仍然不完全):

    Regex re = new Regex(@"eds2[\s.]+url\=\""?(?<Url>[\w\./:\?=&\+%\d_-]+)\""?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>");

结果:

Match : eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo
Group 0 = eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo
Group 1 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 2 = PlaceLogo

预期输出为:

Match : <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>

Group 0 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>

Group 1 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>

Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147

Group 3 = PlaceLogo

任何帮助表示感谢。

1 个答案:

答案 0 :(得分:0)

我无法使用示例代码重现您的问题。它创建以下输出:

Match : <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>

Group 0 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>

Group 1 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>

Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147

Group 3 = PlaceLogo

请澄清你的问题。

<强>更新
我想,你的问题如下: 您可以将匹配结果直接写入您的响应流,而无需转义它。这意味着,它将被解释为HTML而不是文本,如您所愿 您应该将代码更改为:

Regex re = new Regex(@"(?<ExternalSource2>\<eds2[\s.]+url\=\""?(?<Url>[\w\./:\?=&\+%\d_-]+)\""?[\s.]*\>(?<Text>[\s.]*[\w\s\d]*)\</eds2\>)");

string Input = @"width: 662px; height: 60px; vertical-align: middle""><eds2 url=""http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147"">PlaceLogo</eds2></td></tr></tbody></table><table style=""width: 662px; border-collapse: collapse""><tbod";

foreach (Match m in re.Matches(Input))
{
    HttpContext.Current.Response.Write(string.Format("Match : {0}<br/>",
                                                     Server.HtmlEncode(m)));
    short i = 0;
    foreach (Group g in m.Groups)
    {
        HttpContext.Current.Response
                           .Write(string.Format("Group {0} = {1}<br/>", i++, 
                                                Server.HtmlEncode(g.Value)));
    }
    HttpContext.Current.Response.Write("<br/><br/>");
}